Methods for Production of Synthetic Promoters with Defined Specificity

ABSTRACT

The present invention relates to methods for the design and production of synthetic promoters with a defined specificity and promoters produced with these methods.

FIELD OF THE INVENTION

The present invention relates to methods for the design and productionof synthetic promoters with a defined specificity and promoters producedwith these methods.

BACKGROUND OF THE INVENTION

Manipulation of plants to alter and/or improve phenotypiccharacteristics such as productivity or quality requires expression ofheterologous genes in plant tissues. Such genetic manipulation relies onthe availability of a means to drive and to control gene expression asrequired. For example, genetic manipulation relies on the availabilityand use of suitable promoters which are effective in plants and whichregulate gene expression so as to give the desired effect(s) in thetransgenic plant.

Advanced traits often require the coordinated expression of more thanone gene in a transgenic plant. For example, to achieve the productionof polyunsaturated fatty acids such as archachidonic acid in a plantrequires expression of at least 5 genes. There is also increasing demandof trait stacking which requires the combination of more than one genein transgenic plants.

The availability of suitable promoters for such coordinated expressionis limited. Promoters would often need to have the same tissue and/ordevelopmental specificity and preferably comparable expression strength.One solution has been to use the same promoter for the expression ofseveral genes. Expression constructs comprising more than one expressioncassette with tandem or inverted sequence repeats of for example apromoter cause various problems. When located on one vector, handling ofthe vector in bacteria for cloning, amplification and transformation isdifficult due to recombination events which lead to the loss and/orrearrangement of part of the expression construct. Moreover, sequenceverification of constructs comprising repeated sequences is difficultand sometimes impossible. A further problem of such expressionconstructs comprising repeats of the same promoter sequence is thatrecombination may also occur after introduction into the genome of thetarget organism such as a plant.

Additionally it is well known that repeated promoter sequences in thegenome of organisms such as a plant may induce silencing of expressionderived from these promoters, for example by methylation of the promoteror increase of chromatin density at the site of the promoters whichmakes the promoter inaccessible for transcription factors.

The use of different promoters in expression constructs comprising morethan one expression cassette is one possibility to circumvent theseproblems. Isolation and analysis of promoters is laborious and timeconsuming. It is unpredictable what expression pattern and expressionstrength an isolated promoter will have and hence a high number ofpromoters need to be tested in order to find at least two promoters withcomparable expression pattern and optionally comparable expressionstrength.

There is, therefore, a great need in the art for the availability of newsequences that may be used for expression of selected transgenes ineconomically important plants. It is thus an objective of the presentinvention to provide new methods for the production of syntheticpromoters with identical and/or overlapping expression pattern orexpression specificity and optionally similar expression strength. Thisobjective is solved by the present invention.

DETAILED DESCRIPTION OF THE INVENTION

A first embodiment of the invention is a method for the production ofone or more synthetic regulatory nucleic acid molecules of a definedspecificity comprising the steps of

-   -   a) identifying at least one naturally occurring nucleic acid        molecule of the defined specificity (starting molecule) and    -   b) identifying conserved motives in the at least one nucleic        acid sequence (starting sequence) of the starting molecule of        the defined specificity as defined in a) and    -   c) mutating the starting sequence while        -   i) leaving at least 70%, preferably 80%, 85%, 90%, more            preferably at least 95%, even more preferably at least 98%            or at least 99% for example 100% of the motives unaltered            known to be involved in regulation of the respective defined            specificity (also called preferentially associated motives)            and        -   ii) leaving at least 80%, preferably at least 90%, 95% for            example 100% of the motives unaltered involved in            transcription initiation (also called essential motives) and        -   iii) leaving at least 10%, preferably at least 20%, 30%, 40%            or 50%, more preferably at least 60%, 70% or 80%, even more            preferably at least 90% or 95% of other identified motives            (also called non exclusively associated) unaltered and        -   iv) keeping the arrangement of the identified motives            substantially unchanged and        -   v) avoiding the introduction of new motives known to            influence expression with another specificity than said            defined specificity and        -   vi) avoiding identical stretches of more than 50 basepairs,            preferably 45 basepairs, more preferably 40 basepairs, most            preferably 35 basepairs, for example 30 basepairs between            each of the starting sequence and the one or more mutated            sequences and    -   d) producing a nucleic acid molecule comprising the mutated        sequence and    -   e) optionally testing the specificity of the mutated sequence in        the respective organism.

In one embodiment of the invention, additional preferably associatedmotives may be introduced into the sequence of the synthetic nucleicacid molecule.

Production of the nucleic acid molecule comprising the mutated sequencecould for example be done by chemical synthesis or by oligo ligationwhereby smaller oligos comprising parts of the sequence of the inventionare stepwise annealed and ligated to form the nucleic acid molecule ofthe invention.

In a preferred embodiment of the invention, the synthetic regulatorynucleic acid molecule is a synthetic promoter, in a more preferredembodiment the synthetic regulatory nucleic acid molecule is a syntheticpromoter functional in a plant, plant tissue or plant cell.

The at least one starting molecule comprising the starting sequence mayfor example be identified by searches in literature or internetresources such as sequence and/or gene expression data bases. The atleast one starting molecule comprising the starting sequence may inanother example be identified by isolation and characterization of anatural occurring promoter from the respective organism, for exampleplants, algae, fungi, animals and the like. Such methods are well knownto a person skilled in the art and for example described in Back et al.,1991, Keddie et al., 1992, Keddie et al., 1994.

Motives in a series of nucleic acid molecules may be identified by avariety of bioinformatic tools available in the art. For example seeHehl and Wingender, 2001, Hehl and Bulow, 2002, Cartharius et al., 2005,Kaplan et al., 2006, Dare at al., 2008.

In addition, there are various databases available specialized inpromoter analysis and motif prediction in any given sequence. Forexample as reviewed in Hehl and Wingender, 2001. It is also possible toidentify motives necessary for regulation of expression of the definedspecificity with experimental methods known to a skilled person. Suchmethods are for example deletion or mutation analysis of the respectivestarting sequence as for example described in Montgomery et al., 1993.

Essential motives known to be involved in transcription initiation forexample by being bound by general initiation factors and/or RNApolymerases as described above under ii) are for example the TATA box,the CCAAT box, the GC box or other functional similar motives as forexample identified in Roeder (1996, Trends in Biochemical Science,21(9)) or Baek et al. (2006, Journal of Biological Chemistry, 281).These motives allow a certain degree of degeneration or variation oftheir sequence without changing or destroying their functionality ininitiation of transcription. The skilled person is aware of suchsequence variations that leave the respective motives functional. Suchvariations are for example given in the Transfac database as describedby Matys et al, ((2003) NAR 31 (1)) and literature given therein. TheTransfac database may for example be accessed viaftp://ftp.ebi.ac.uk/pub/databases/transfac/transfac32.tar.Z. Hence it isto be understood that the term “leaving motives unaltered involved intranscription initiation” means that the respective motives may bemutated, hence altered in their sequence as long as their respectivefunction which is enabling initiation of transcription is not altered,hence as long as the essential motives are functional. In anotherembodiment of the invention the first 49, preferably 44, more preferably39, even more preferably 34, most preferably 29 bp directly upstream ofthe transcription initiation site are kept unaltered.

The term “keeping the arrangement of the motives unchanged” as usedabove under iv) means, that the order of the motives and/or the distancebetween the motives are kept substantially unchanged, preferablyunchanged. Substantially unchanged means, that the distance between twomotives in the starting sequence does not differ from the distancebetween these motives in the synthetic regulatory nucleic acid sequence,hence the distance between said motives is not longer or shorter, bymore than 100%, for example 90%, 80% or 70%, preferably 60%, 50% or 40%,more preferably not more than 30% or 20%, most preferably not more than10% in the synthetic regulatory nucleic acid sequence as compared to thestarting sequence. Preferably the distance between two motives in thestarting sequence differs by not more than 10, preferably 9, morepreferably 8 or 7 or 6 or 5 or 4, even more preferably not more than 3or 2, most preferably not more than 1 basepairs from the distance in thepermutated sequence. Inverted and/or direct stretches of repeatedsequences may lead to the formation of secondary structures in plasmidsor genomic DNA. Repeated sequences may lead to recombination, deletionand/or rearrangement in the plasmid both in E. coli and Agrobacterium.In eukaryotic organisms, for example plants, repeated sequences alsotend to be silenced by methylation. Recombination events which lead todeletions or rearrangements of one or more expression cassettes and/orT-DNAs are likely to lead to loss of function for example loss ofexpression of such constructs in the transgenic plant (Que andJorgensen, 1998, Hamilton et al., 1998). It is therefore a criticalfeature of the invention at hand to avoid identical stretches of 50basepairs, preferably 45 basepairs, more preferably 40 basepairs, mostpreferably 35 basepairs, for example 30 basepairs between each of thestarting sequence and the one or more permutated sequences. In case ofthe production of more than one permutated sequences said identicalstretches must be avoided between the starting sequence and each of thepermutated sequences in a pair wise comparison. In another embodiment,such identical stretches must be avoided between all permutatedsequences and the starting sequence; hence none of the permutated andstarting sequences shares such identical stretches with any of the othersequences.

The skilled person is aware that regulatory nucleic acids may comprisepromoters and functionally linked to said promoters 5′UTR the latter maycomprise at least one intron. It has been shown, that introns may belead to increased expression levels derived from the promoter to whichthe 5′UTR comprising the intron is functionally linked. The 5′ UTR andthe intron may be altered in their sequence as described, wherein thesplice sites and putative branching point are not altered in order toensure correct splicing of the intron after permutation. No nucleotideexchanges are introduced into sequences at least 2, preferably at least3, more preferably at least 5, even more preferably at least 10 basesup- and downstream of the splice sites (5′ GT; 3′ CAG) are keptunchanged. In addition, “CURAY” and “TNA” sequence elements beingpotential branching points of the intron are kept unchanged within thelast 200 base pairs, preferably the last 150 base pairs, more preferablythe last 100 base pairs, even more preferably the last 75 base pairs ofthe respective intron.

The 5′UTR may be permutated according to the rules as defined above,wherein preferably at least 25, more preferably at least 20, even morepreferably at least 15, for example at least 10, most preferably atleast 5 base pairs up- and downstream of the transcription start arekept unchanged. The AT content of both the 5′ UTR and the intron is notchanged by more than 20%, preferably not more than 15%, for example 10%or 5% compared to the AT content of the starting sequence.

A further embodiment of the invention is a synthetic regulatory nucleicacid molecule produced according to the method of the invention.

An expression construct comprising the said synthetic regulatory nucleicacid molecule is another embodiment of the invention.

A vector comprising the regulatory nucleic acid molecule or theexpression construct of the invention is also comprised in thisinvention, as well as microorganisms, plant cells or animal cellscomprising the regulatory nucleic acid molecule, the expressionconstruct and/or the vector of the invention.

A further embodiment of the invention is a plant, plant seed, plant cellor part of a plant comprising the regulatory nucleic acid molecule, theexpression construct and/or the vector of the invention.

A further embodiment of the invention are exemplary recombinant seedspecific or seed preferential synthetic regulatory nucleic acidmolecules produced according to the method of the invention wherein theregulatory nucleic acid molecule is comprised in the group consisting of

-   -   I) a nucleic acid molecule represented by SEQ ID NO: 2, 4 or 6        and    -   II) a nucleic acid molecule comprising at least 1000 consecutive        base pairs, for example 1000 base pairs, preferably at least 800        consecutive base pairs, for example 800 base pairs, more        preferably at least 700 consecutive base pairs, for example 700        base pairs, even more preferably at least 600 consecutive base        pairs, for example 600 base pairs, most preferably at least 500        consecutive base pairs, for example 500 base pairs or at least        400, at least 300, at least 250 for example 400, 300 or 250 base        pairs of a sequence described by SEQ ID NO: 2, 4 or 6 and    -   III) a nucleic acid molecule having an identity of at least 70%,        for example at least 75%, 76%, 77%, 78%, 79% preferably at least        80%, for example at least 81%, 82%, 83%, 84%, 85%, 86%, 87%,        88%, 89%, more preferably 90%, for example at least 91%, 92%,        93%, 94%, 95%, 96%, 97%, even more preferably 98% most        preferably 99% over a sequence of at least 250, 300, 400, 500,        600 preferably 700, more preferably 800, even more preferably        900, most preferably 1000 consecutive nucleic acid base pairs to        a sequences described by SEQ ID NO: 2, 4 or 6 and IV) a nucleic        acid molecule having an identity of at least 70%, for example at        least 75%, 76%, 77%, 78%, 79% preferably at least 80%, for        example at least 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%,        more preferably 90%, for example at least 91%, 92%, 93%, 94%,        95%, 96%, 97%, even more preferably 98% most preferably 99% to a        sequence consisting of at least 50%, 60%, 70%, 80%, 90% or 100%        of any of the sequences described by SEQ ID NO: 2, 4 or 6 and    -   V) a nucleic acid molecule hybridizing under high stringent,        preferably very high stringent conditions with a nucleic acid        molecule of at least 250, 300, 400, 500, 600, 700, 800, 900,        1000 or the complete consecutive base pairs of a nucleic acid        molecule described by any of SEQ ID NO: 2, 4 or 6 and    -   VI) a complement of any of the nucleic acid molecules as defined        in I) to V).

Another embodiment of the invention are exemplary recombinant seedspecific or seed preferential synthetic regulatory nucleic acidmolecules produced according to the method of the invention wherein theregulatory nucleic acid molecule is comprised in the group consisting of

-   -   i) a nucleic acid molecule represented by SEQ ID NO: 2, 4 or 6        and    -   ii) a nucleic acid molecule comprising at least 1000 consecutive        base pairs, for example 1000 base pairs, preferably at least 800        consecutive base pairs, for example 800 base pairs, more        preferably at least 700 consecutive base pairs, for example 700        base pairs, even more preferably at least 600 consecutive base        pairs, for example 600 base pairs, most preferably at least 500        consecutive base pairs, for example 500 base pairs or at least        400, at least 300, at least 250 for example 400, 300 or 250 base        pairs of a sequence described by SEQ ID NO: 2, 4 or 6 and    -   iii) a nucleic acid molecule having an identity of at least 75%        over a sequence of at least 250, 300, 400, 500, 600 preferably        700, more preferably 800, even more preferably 900, most        preferably 1000 or the complete consecutive nucleic acid base        pairs to a sequences described by SEQ ID NO: 6,    -   iv) a nucleic acid molecule having an identity of at least 90%        over a sequence of at least 250, 300, 400, 500, 600 preferably        700, more preferably 800, even more preferably 900, most        preferably 1000 or the complete consecutive nucleic acid base        pairs to a sequences described by SEQ ID NO: 2 or 4 and    -   v) a nucleic acid molecule hybridizing under high stringent,        preferably very high stringent conditions with a nucleic acid        molecule of at least 250, 300, 400, 500, 600, 700, 800, 900,        1000 or the complete consecutive base pairs of a nucleic acid        molecule described by any of SEQ ID NO: 2, 4 or 6 and    -   vi) a complement of any of the nucleic acid molecules as defined        in i) to v).

Further embodiments of the invention are exemplary recombinantconstitutive regulatory nucleic acid molecules produced according to themethod of the invention wherein the regulatory nucleic acid molecule iscomprised in the group consisting of

-   -   I) a nucleic acid molecule represented by SEQ ID NO: 14 or 15        and    -   II) a nucleic acid molecule comprising at least 1750, 1500, 1250        or 1000 consecutive base pairs, for example 1000 base pairs,        preferably at least 800 consecutive base pairs, for example 800        base pairs, more preferably at least 700 consecutive base pairs,        for example 700 base pairs, even more preferably at least 600        consecutive base pairs, for example 600 base pairs, most        preferably at least 500 consecutive base pairs, for example 500        base pairs or at least 400, at least 300, at least 250 for        example 400, 300 or 250 base pairs of a sequence described by        SEQ ID NO: 14 or 15 and    -   III) a nucleic acid molecule having an identity of at least 70%,        for example at least 75%, 76%, 77%, 78%, 79% preferably at least        80%, for example at least 81%, 82%, 83%, 84%, 85%, 86%, 87%,        88%, 89%, more preferably 90%, for example at least 91%, 92%,        93%, 94%, 95%, 96%, 97%, even more preferably 98% most        preferably 99% over a sequence of at least 250, 300, 400, 500,        600 preferably 700, more preferably 800, even more preferably        900, for example 1000, most preferably 1250, for example 1500 or        1750 or 2000 consecutive nucleic acid base pairs to a sequences        described by SEQ ID NO: 14 or 15 and    -   IV) a nucleic acid molecule having an identity of at least 70%,        for example at least 75%, 76%, 77%, 78%, 79% preferably at least        80%, for example at least 81%, 82%, 83%, 84%, 85%, 86%, 87%,        88%, 89%, more preferably 90%, for example at least 91%, 92%,        93%, 94%, 95%, 96%, 97%, even more preferably 98% most        preferably 99% to a sequence consisting of at least 50%, 60%,        70%, 80%, 90% or 100% of any of the sequences described by SEQ        ID NO: 14 or 15 and    -   V) a nucleic acid molecule hybridizing under high stringent,        preferably very high stringent conditions with a nucleic acid        molecule of at least 250, 300, 400, 500, 600, 700, 800, 900,        1000, 1250, 1500, 1750 or 2000 or the complete consecutive base        pairs of a nucleic acid molecule described by any of SEQ ID NO:        14 or 15 and    -   VI) a complement of any of the nucleic acid molecules as defined        in I) to V).

Another embodiment of the invention are exemplary recombinantconstitutive synthetic regulatory nucleic acid molecules producedaccording to the method of the invention wherein the regulatory nucleicacid molecule is comprised in the group consisting of

-   -   i) a nucleic acid molecule represented by SEQ ID NO: 14 or 15        and    -   ii) a nucleic acid molecule comprising at least 2000, 1750,        1500, 1250 or 1000 consecutive base pairs, for example 1000 base        pairs, preferably at least 800 consecutive base pairs, for        example 800 base pairs, more preferably at least 700 consecutive        base pairs, for example 700 base pairs, even more preferably at        least 600 consecutive base pairs, for example 600 base pairs,        most preferably at least 500 consecutive base pairs, for example        500 base pairs or at least 400, at least 300, at least 250 for        example 400, 300 or 250 base pairs of a sequence described by        SEQ ID NO: 14 or 15 and    -   iii) a nucleic acid molecule having an identity of at least 95%,        preferably 97%, more preferably 98%, most preferably 99% over a        sequence of at least 250, 300, 400, 500, 600 preferably 700,        more preferably 800, even more preferably 900, fore example        1000, most preferably 1500, for example 2000 or the complete        consecutive nucleic acid base pairs to a sequences described by        SEQ ID NO: 14 or 15,    -   iv) a nucleic acid molecule hybridizing under high stringent,        preferably very high stringent conditions with a nucleic acid        molecule of at least 250, 300, 400, 500, 600, 700, 800, 900,        1000, 1250. 1500, 1750, 2000 or the complete consecutive base        pairs of a nucleic acid molecule described by any of SEQ ID NO:        14 or 15 and    -   v) a complement of any of the nucleic acid molecules as defined        in i) to v).

It is to be understood, that the group of exemplary recombinant seedspecific or seed preferential or constitutive synthetic regulatorynucleic acid molecules produced according to the method of the inventionas defined above under I) to V) and i) to vi) does not comprise thestarting molecules as defined by SEQ ID NO: 1, 3, 5 and 13 or acomplement thereof or a nucleic acid molecule having at least 250consecutive base pairs of a sequence described by SEQ ID NO: 1, 3, 5 or13 or a complement thereof or any other nucleic acid molecule occurringin a wild type plant as such nucleic acid molecules are molecules thatare not produced according to the invention but are naturally present inwild type plants.

An expression construct comprising any of said synthetic regulatorynucleic acid molecules as defined above under I) to VI) and i) to vi) isanother embodiment of the invention.

A vector comprising the regulatory nucleic acid molecule or theexpression construct of the invention is also comprised in thisinvention, as well as microorganisms, plant cells or animal cellscomprising the regulatory nucleic acid molecule, the expressionconstruct and/or the vector of the invention.

A further embodiment of the invention is a plant, plant seed, plant cellor part of a plant comprising the regulatory nucleic acid molecule, theexpression construct and/or the vector of the invention.

DEFINITIONS

Abbreviations: GFP—green fluorescence protein, GUS—beta-Glucuronidase,BAP—6-benzylaminopurine; 2,4-D—2,4-dichlorophenoxyacetic acid;MS—Murashige and Skoog mediurn; NAA—1-naphtaleneacetic acid; MES,2-(N-morpholino-ethanesulfonic acid, IAA indole acetic acid; Kan:Kanamycin sulfate; GA3—Gibberellic acid; Timentin™: ticarcillindisodium/clavulanate potassium.

It is to be understood that this invention is not limited to theparticular methodology or protocols. It is also to be understood thatthe terminology used herein is for the purpose of describing particularembodiments only, and is not intended to limit the scope of the presentinvention which will be limited only by the appended claims. It must benoted that as used herein and in the appended claims, the singular forms“a,” “and,” and “the” include plural reference unless the contextclearly dictates otherwise. Thus, for example, reference to “a vector”is a reference to one or more vectors and includes equivalents thereofknown to those skilled in the art, and so forth. The term “about” isused herein to mean approximately, roughly, around, or in the region of.When the term “about” is used in conjunction with a numerical range, itmodifies that range by extending the boundaries above and below thenumerical values set forth. In general, the term “about” is used hereinto modify a numerical value above and below the stated value by avariance of 20 percent, preferably 10 percent up or down (higher orlower). As used herein, the word “or” means any one member of aparticular list and also includes any combination of members of thatlist. The words “comprise,” “comprising,” “include,” “including,” and“includes” when used in this specification and in the following claimsare intended to specify the presence of one or more stated features,integers, components, or steps, but they do not preclude the presence oraddition of one or more other features, integers, components, steps, orgroups thereof. For clarity, certain terms used in the specification aredefined and used as follows:

Antiparallel: “Antiparallel” refers herein to two nucleotide sequencespaired through hydrogen bonds between complementary base residues withphosphodiester bonds running in the 5′-3′ direction in one nucleotidesequence and in the 3′-5′ direction in the other nucleotide sequence.

Antisense: The term “antisense” refers to a nucleotide sequence that isinverted relative to its normal orientation for transcription orfunction and so expresses an RNA transcript that is complementary to atarget gene mRNA molecule expressed within the host cell (e.g., it canhybridize to the target gene mRNA molecule or single stranded genomicDNA through Watson-Crick base pairing) or that is complementary to atarget DNA molecule such as, for example genomic DNA present in the hostcell.

“Box” or as synonymously used herein “motif” or “cis-element” of apromoter means a transcription factor binding sequence defined by ahighly conserved core sequence of approximately 4 to 6 nucleotidessurrounded by a conserved matrix sequence of in total up to 20nucleotides within the plus or minus strand of the promoter, which isable of interacting with the DNA binding domain of a transcriptionfactor protein. The conserved matrix sequence allows some variability inthe sequence without loosing its ability to be bound by the DNA bindingdomain of a transcription factor protein.

One way to describe transcription factor binding sites (TFBS) is bynucleotide or position weight matrices (NWM or PWM) (for review seeStormo, 2000). A weight matrix pattern definition is superior to asimple IUPAC consensus sequence as it represents the complete nucleotidedistribution for each single position. It also allows the quantificationof the similarity between the weight matrix and a potential TFBSdetected in the sequence (Cartharius et al. 2005).

Coding region: As used herein the term “coding region” when used inreference to a structural gene refers to the nucleotide sequences whichencode the amino acids found in the nascent polypeptide as a result oftranslation of a mRNA molecule. The coding region is bounded, ineukaryotes, on the 5′-side by the nucleotide triplet “ATG” which encodesthe initiator methionine and on the 3′-side by one of the three tripletswhich specify stop codons (i.e., TAA, TAG, TGA). In addition tocontaining introns, genomic forms of a gene may also include sequenceslocated on both the 5′- and 3′-end of the sequences which are present onthe RNA transcript. These sequences are referred to as “flanking”sequences or regions (these flanking sequences are located 5′ or 3′ tothe non-translated sequences present on the mRNA transcript). The5′-flanking region may contain regulatory sequences such as promotersand enhancers which control or influence the transcription of the gene.The 3′-flanking region may contain sequences which direct thetermination of transcription, post-transcriptional cleavage andpolyadenylation.

Complementary: “Complementary” or “complementarity” refers to twonucleotide sequences which comprise antiparallel nucleotide sequencescapable of pairing with one another (by the base-pairing rules) uponformation of hydrogen bonds between the complementary base residues inthe antiparallel nucleotide sequences. For example, the sequence5′-AGT-3′ is complementary to the sequence 5′-ACT-3′. Complementaritycan be “partial” or “total.” “Partial” complementarity is where one ormore nucleic acid bases are not matched according to the base pairingrules. “Total” or “complete” complementarity between nucleic acidmolecules is where each and every nucleic acid base is matched withanother base under the base pairing rules. The degree of complementaritybetween nucleic acid molecule strands has significant effects on theefficiency and strength of hybridization between nucleic acid moleculestrands. A “complement” of a nucleic acid sequence as used herein refersto a nucleotide sequence whose nucleic acid molecules show totalcomplementarity to the nucleic acid molecules of the nucleic acidsequence.

Conserved motives: A conserved motif as used herein means a sequencemotif or box found in various promoters having the same or overlappingspecificity. Overlapping specificity means the specificity of at leasttwo promoters wherein the expression derived from one promoter is inpart or completely in the same for example tissue as the other promoter,wherein the latter one may drive expression in additional tissues inwhich the first promoter may not drive expression. Motives may begrouped in three classes:

Essential: motives present in the promoters of most genes that aretranscribed by RNA Polymerase II and which are preferentially localizedclose to the transcription start side. Such motives must not be madedysfunctional by mutations according to the method of the invention.Hence they must not be altered in a way that prevents them from beingbound by the respective DNA binding domain of the transcription factorprotein that would have bound to the unaltered sequence.

non exclusively associated: motives present in the promoters of genesthat are associated with certain tissues/physiological states/treatmentsbut not exclusively, they may be expressed also in othertissues/physiological states/treatments. According to the method of theinvention, such motives should preferably not be made dysfunctional bymutations or at least only a certain percentage of such motives presentin one particular promoter or starting sequence. Hence they shouldpreferably not be altered in a way that prevents them from being boundby the respective DNA binding domain of the transcription factor proteinthat would have bound to the unaltered sequence.

preferentially associated: motives present in the promoters of genesthat are expressed preferentially in specific tissues/physiologicalstates/treatments. The vast majority of such motives identified in astarting sequence must not be made dysfunctional by mutations accordingto the method of the invention. Hence they must not be altered in a waythat prevents them from being bound by the respective DNA binding domainof the transcription factor protein that would have bound to theunaltered sequence.

Defined specificity: the term “defined specificity” means any expressionspecificity of a promoter, preferably a plant specific promoter, whichis beneficial for the expression of a distinct coding sequence or RNA. Adefined specificity may for example be a tissue or developmentalspecificity or the expression specificity could be defined by inductionor repression of expression by biotic or abiotic stimuli or acombination of any of these.

Double-stranded RNA: A “double-stranded RNA” molecule or “dsRNA”molecule comprises a sense RNA fragment of a nucleotide sequence and anantisense RNA fragment of the nucleotide sequence, which both comprisenucleotide sequences complementary to one another, thereby allowing thesense and antisense RNA fragments to pair and form a double-stranded RNAmolecule.

Endogenous: An “endogenous” nucleotide sequence refers to a nucleotidesequence, which is present in the genome of the untransformed plantcell.

Expression: “Expression” refers to the biosynthesis of a gene product,preferably to the transcription and/or translation of a nucleotidesequence, for example an endogenous gene or a heterologous gene, in acell. For example, in the case of a structural gene, expression involvestranscription of the structural gene into mRNA and—optionally—thesubsequent translation of mRNA into one or more polypeptides. In othercases, expression may refer only to the transcription of the DNAharboring an RNA molecule. Expression may also refer to the change ofthe steady state level of the respective RNA in a plant or part thereofdue to change of the stability of the respective RNA.

Similar expression strength: Two or more regulatory nucleic acidmolecules have a similar expression strength when the expression derivedfrom any of the regulatory nucleic acid molecule in a distinct cell,tissue or plant organ does not deviate by more than factor 2.

Expression construct: “Expression construct” as used herein mean a DNAsequence capable of directing expression of a particular nucleotidesequence in an appropriate part of a plant or plant cell, comprising apromoter functional in said part of a plant or plant cell into which itwill be introduced, operatively linked to the nucleotide sequence ofinterest which is—optionally—operatively linked to termination signals.If translation is required, it also typically comprises sequencesrequired for proper translation of the nucleotide sequence. The codingregion may code for a protein of interest but may also code for afunctional RNA of interest, for example RNAa, siRNA, snoRNA, snRNA,microRNA, ta-siRNA or any other noncoding regulatory RNA, in the senseor antisense direction. The expression construct comprising thenucleotide sequence of interest may be chimeric, meaning that one ormore of its components is heterologous with respect to one or more ofits other components. The expression construct may also be one, which isnaturally occurring but has been obtained in a recombinant form usefulfor heterologous expression. Typically, however, the expressionconstruct is heterologous with respect to the host, i.e., the particularDNA sequence of the expression construct does not occur naturally in thehost cell and must have been introduced into the host cell or anancestor of the host cell by a transformation event. The expression ofthe nucleotide sequence in the expression construct may be under thecontrol of a constitutive promoter or of an inducible promoter, whichinitiates transcription only when the host cell is exposed to someparticular external stimulus. In the case of a plant, the promoter canalso be specific to a particular tissue or organ or stage ofdevelopment.

Expression pattern or expression specificity of a regulatory nucleicacid molecule as used herein defines the tissue and/or developmentaland/or environmentally modulated expression of a coding sequence or RNAunder the control of a distinct regulatory nucleic acid molecule.

Foreign: The term “foreign” refers to any nucleic acid molecule (e.g.,gene sequence) which is introduced into the genome of a cell byexperimental manipulations and may include sequences found in that cellso long as the introduced sequence contains some modification (e.g., apoint mutation, the presence of a selectable marker gene, etc.) and istherefore distinct relative to the naturally-occurring sequence.

Functional linkage: The term “functional linkage” or “functionallylinked” is to be understood as meaning, for example, the sequentialarrangement of a regulatory element (e.g. a promoter) with a nucleicacid sequence to be expressed and, if appropriate, further regulatoryelements (such as e.g., a terminator or an enhancer) in such a way thateach of the regulatory elements can fulfill its intended function toallow, modify, facilitate or otherwise influence expression of saidnucleic acid sequence. As a synonym the wording “operable linkage” or“operably linked” may be used. The expression may result depending onthe arrangement of the nucleic acid sequences in relation to sense orantisense RNA. To this end, direct linkage in the chemical sense is notnecessarily required. Genetic control sequences such as, for example,enhancer sequences, can also exert their function on the target sequencefrom positions which are further away, or indeed from other DNAmolecules. Preferred arrangements are those in which the nucleic acidsequence to be expressed recombinantly is positioned behind the sequenceacting as promoter, so that the two sequences are linked covalently toeach other. The distance between the promoter sequence and the nucleicacid sequence to be expressed recombinantly is preferably less than 200base pairs, especially preferably less than 100 base pairs, veryespecially preferably less than 50 base pairs. In a preferredembodiment, the nucleic acid sequence to be transcribed is locatedbehind the promoter in such a way that the transcription start isidentical with the desired beginning of the chimeric RNA of theinvention. Functional linkage, and an expression construct, can begenerated by means of customary recombination and cloning techniques asdescribed (e.g., in Maniatis T, Fritsch EF and Sambrook J (1989)Molecular Cloning: A Laboratory Manual, 2nd Ed., Cold Spring HarborLaboratory, Cold Spring Harbor (NY); Silhavy et al. (1984) Experimentswith Gene Fusions, Cold Spring Harbor Laboratory, Cold Spring Harbor(NY); Ausubel et al. (1987) Current Protocols in Molecular Biology,Greene Publishing Assoc. and Wiley Interscience; Gelvin et al. (Eds)(1990) Plant Molecular Biology Manual; Kluwer Academic Publisher,Dordrecht, The Netherlands). However, further sequences, which, forexample, act as a linker with specific cleavage sites for restrictionenzymes, or as a signal peptide, may also be positioned between the twosequences. The insertion of sequences may also lead to the expression offusion proteins. Preferably, the expression construct, consisting of alinkage of a regulatory region for example a promoter and nucleic acidsequence to be expressed, can exist in a vector-integrated form and beinserted into a plant genome, for example by transformation.

Gene: The term “gene” refers to a region operably joined to appropriateregulatory sequences capable of regulating the expression of the geneproduct (e.g., a polypeptide or a functional RNA) in some manner. A geneincludes untranslated regulatory regions of DNA (e.g., promoters,enhancers, repressors, etc.) preceding (up-stream) and following(downstream) the coding region (open reading frame, ORF) as well as,where applicable, intervening sequences (i.e., introns) betweenindividual coding regions (i.e., exons). The term “structural gene” asused herein is intended to mean a DNA sequence that is transcribed intomRNA which is then trans-lated into a sequence of amino acidscharacteristic of a specific polypeptide.

Genome and genomic DNA: The terms “genome” or “genomic DNA” is referringto the heritable genetic information of a host organism. Said genomicDNA comprises the DNA of the nucleus (also referred to as chromosomalDNA) but also the DNA of the plastids (e.g., chloroplasts) and othercellular organelles (e.g., mitochondria). Preferably the terms genome orgenomic DNA is referring to the chromosomal DNA of the nucleus.

Heterologous: The term “heterologous” with respect to a nucleic acidmolecule or DNA refers to a nucleic acid molecule which is operablylinked to, or is manipulated to become operably linked to, a secondnucleic acid molecule to which it is not operably linked in nature, orto which it is operably linked at a different location in nature. Aheterologous expression construct comprising a nucleic acid molecule andone or more regulatory nucleic acid molecule (such as a promoter or atranscription termination signal) linked thereto for example is aconstructs originating by experimental manipulations in which either a)said nucleic acid molecule, or b) said regulatory nucleic acid moleculeor c) both (i.e. (a) and (b)) is not located in its natural (native)genetic environment or has been modified by experimental manipulations,an example of a modification being a substitution, addition, deletion,inversion or insertion of one or more nucleotide residues. Naturalgenetic environment refers to the natural chromosomal locus in theorganism of origin, or to the presence in a genomic library. In the caseof a genomic library, the natural genetic environment of the sequence ofthe nucleic acid molecule is preferably retained, at least in part. Theenvironment flanks the nucleic acid sequence at least at one side andhas a sequence of at least 50 bp, preferably at least 500 bp, especiallypreferably at least 1,000 bp, very especially preferably at least 5,000bp, in length. A naturally occurring expression construct—for examplethe naturally occurring combination of a promoter with the correspondinggene—becomes a transgenic expression construct when it is modified bynon-natural, synthetic “artificial” methods such as, for example,mutagenization. Such methods have been described (U.S. Pat. No.5,565,350; WO 00/15815). For example a protein encoding nucleic acidmolecule operably linked to a promoter, which is not the native promoterof this molecule, is considered to be heterologous with respect to thepromoter. Preferably, heterologous DNA is not endogenous to or notnaturally associated with the cell into which it is introduced, but hasbeen obtained from another cell or has been synthesized. HeterologousDNA also includes an endogenous DNA sequence, which contains somemodification, non-naturally occurring, multiple copies of an endogenousDNA sequence, or a DNA sequence which is not naturally associated withanother DNA sequence physically linked thereto. Generally, although notnecessarily, heterologous DNA encodes RNA or proteins that are notnormally produced by the cell into which it is expressed.

Hybridization: The term “hybridization” as used herein includes “anyprocess by which a strand of nucleic acid molecule joins with acomplementary strand through base pairing.” (J. Coombs (1994) Dictionaryof Biotechnology, Stockton Press, New York). Hybridization and thestrength of hybridization (i.e., the strength of the association betweenthe nucleic acid molecules) is impacted by such factors as the degree ofcomplementarity between the nucleic acid molecules, stringency of theconditions involved, the Tm of the formed hybrid, and the G:C ratiowithin the nucleic acid molecules. As used herein, the term “Tm” is usedin reference to the “melting temperature.” The melting temperature isthe temperature at which a population of double-stranded nucleic acidmolecules becomes half dissociated into single strands. The equation forcalculating the Tm of nucleic acid molecules is well known in the art.As indicated by standard references, a simple estimate of the Tm valuemay be calculated by the equation: T_(m)=81.5+0.41(% G+C), when anucleic acid molecule is in aqueous solution at 1 M NaCl [see e.g.,Anderson and Young, Quantitative Filter Hybridization, in Nucleic AcidHybridization (1985)]. Other references include more sophisticatedcomputations, which take structural as well as sequence characteristicsinto account for the calculation of Tm. Stringent conditions, are knownto those skilled in the art and can be found in Current Protocols inMolecular Biology, John Wiley & Sons, N.Y. (1989), 6.3.1-6.3.6.

Medium stringency conditions when used in reference to nucleic acidhybridization comprise conditions equivalent to binding or hybridizationat 68° C. in a solution consisting of 5×SSPE (43.8 g/L NaCl, 6.9 g/LNaH2PO4.H2O and 1.85 g/L EDTA, pH adjusted to 7.4 with NaOH), 1% SDS,5×Denhardt's reagent [50×Denhardt's contains the following per 500 mL 5g Ficoll (Type 400, Pharmacia), 5 g BSA (Fraction V; Sigma)] and 100μg/mL denatured salmon sperm DNA followed by washing (preferably for onetimes 15 minutes, more preferably two times 15 minutes, more preferablythree time 15 minutes) in a solution comprising 1×SSC (1×SSC is 0.15 MNaCl plus 0.015 M sodium citrate) and 0.1% SDS at room temperature or—preferably 37° C.—when a DNA probe of preferably about 100 to about 500nucleotides in length is employed.

High stringency conditions when used in reference to nucleic acidhybridization comprise conditions equivalent to binding or hybridizationat 68° C. in a solution consisting of 5×SSPE (43.8 g/L NaCl, 6.9 g/LNaH2PO4.H2O and 1.85 g/L EDTA, pH adjusted to 7.4 with NaOH), 1% SDS,5×Denhardt's reagent [50×Denhardt's contains the following per 500 mL 5g Ficoll (Type 400, Pharmacia), 5 g BSA (Fraction V; Sigma)] and 100μg/mL denatured salmon sperm DNA followed by washing (preferably for onetimes 15 minutes, more preferably two times 15 minutes, more preferablythree time 15 minutes) in a solution comprising 0.1×SSC (1×SSC is 0.15 MNaCl plus 0.015 M sodium citrate) and 1% SDS at room temperature or—preferably 37° C.—when a DNA probe of preferably about 100 to about 500nucleotides in length is employed.

Very high stringency conditions when used in reference to nucleic acidhybridization comprise conditions equivalent to binding or hybridizationat 68° C. in a solution consisting of 5×SSPE, 1% SDS, 5×Denhardt'sreagent and 100 μg/mL denatured salmon sperm DNA followed by washing(preferably for one times 15 minutes, more preferably two times 15minutes, more preferably three time 15 minutes) in a solution comprising0.1×SSC, and 1% SDS at 68° C., when a probe of preferably about 100 toabout 500 nucleotides in length is employed.

“Identity”: “Identity” when used in respect to the comparison of two ormore nucleic acid or amino acid molecules means that the sequences ofsaid molecules share a certain degree of sequence similarity, thesequences being partially identical. To determine the percentageidentity (homology is herein used interchangeably) of two amino acidsequences or of two nucleic acid molecules, the sequences are writtenone underneath the other for an optimal comparison (for example gaps maybe inserted into the sequence of a protein or of a nucleic acid in orderto generate an optimal alignment with the other protein or the othernucleic acid).

The amino acid residues or nucleic acid molecules at the correspondingamino acid positions or nucleotide positions are then compared. If aposition in one sequence is occupied by the same amino acid residue orthe same nucleic acid molecule as the corresponding position in theother sequence, the molecules are homologous at this position (i.e.amino acid or nucleic acid “homology” as used in the present contextcorresponds to amino acid or nucleic acid “identity”. The percentageidentity between the two sequences is a function of the number ofidentical positions shared by the sequences (i.e. % homology=number ofidentical positions/total number of positions×100). The terms “homology”and “identity” are thus to be considered as synonyms.

For the determination of the percentage identity of two or more aminoacids or of two or more nucleotide sequences several computer softwareprograms have been developed. The identity of two or more sequences canbe calculated with for example the software fasta, which presently hasbeen used in the version fasta 3 (W. R. Pearson and D. J. Lipman, PNAS85, 2444 (1988); W. R. Pearson, Methods in Enzymology 183, 63 (1990); W.R. Pearson and D. J. Lipman, PNAS 85, 2444 (1988); W. R. Pearson,Enzymology 183, 63 (1990)). Another useful program for the calculationof identities of different sequences is the standard blast program,which is included in the Biomax pedant software (Biomax, Munich, FederalRepublic of Germany). This leads unfortunately sometimes to suboptimalresults since blast does not always include complete sequences of thesubject and the query. Nevertheless as this program is very efficient itcan be used for the comparison of a huge number of sequences. Thefollowing settings are typically used for such a comparisons ofsequences:

-p Program Name [String]; -d Database [String]; default=nr; -i QueryFile [File In]; default=stdin; -e Expectation value (E) [Real];default=10.0; -m alignment view options: 0=pairwise; 1=query-anchoredshowing identities; 2=query-anchored no identities; 3=flatquery-anchored, show identities; 4=flat query-anchored, no identities;5=query-anchored no identities and blunt ends; 6=flat query-anchored, noidentities and blunt ends; 7=XML Blast output; 8=tabular; 9 tabular withcomment lines [Integer]; default=0; -o BLAST report Output File [FileOut] Optional; default=stdout; -F Filter query sequence (DUST withblastn, SEG with others) [String]; default=T; -G Cost to open a gap(zero invokes default behavior) [Integer]; default=0; -E Cost to extenda gap (zero invokes default behavior) [Integer]; default=0; -X X dropoffvalue for gapped alignment (in bits) (zero invokes default behavior);blastn 30, megablast 20, tblastx 0, all others 15 [Integer]; default=0;-I Show GI's in deflines [T/F]; default=F; -q Penalty for a nucleotidemismatch (blastn only) [Integer]; default=−3; -r Reward for a nucleotidematch (blastn only) [Integer]; default=1; -v Number of databasesequences to show oneline descriptions for (V) [Integer]; default=500;-b Number of database sequence to show alignments for (B) [Integer];default=250; -f Threshold for extending hits, default if zero; blastp11, blastn 0, blastx 12, tblastn 13; tblastx 13, megablast 0 [Integer];default=0; -g Perfom gapped alignment (not available with tblastx)[T/F]; default=T; -Q Query Genetic code to use [Integer]; default=1; -DDB Genetic code (for tblast[nx] only) [Integer]; default=1; -a Number ofprocessors to use [Integer]; default=1; -O SeqAlign file [File Out]Optional; -J Believe the query defline [T/F]; default=F; -M Matrix[String]; default=BLOSUM62; —W Word size, default if zero (blastn 11,megablast 28, all others 3) [Integer]; default=0; -z Effective length ofthe database (use zero for the real size) [Real]; default=0; -K Numberof best hits from a region to keep (off by default, if used a value of100 is recommended) [Integer]; default=0; —P 0 for multiple hit, 1 forsingle hit [Integer]; default=0; —Y Effective length of the search space(use zero for the real size) [Real]; default=0; —S Query strands tosearch against database (for blast[nx], and tblastx); 3 is both, 1 istop, 2 is bottom [Integer]; default=3; -T Produce HTML output [T/F];default=F; —I Restrict search of database to list of GI's [String]Optional; -U Use lower case filtering of FASTA sequence [T/F] Optional;default=F; -y X dropoff value for ungapped extensions in bits (0.0invokes default behavior); blastn 20, megablast 10, all others 7 [Real];default=0.0; -Z X dropoff value for final gapped alignment in bits (0.0invokes default behavior); blastn/megablast 50, tblastx 0, all others 25[Integer]; default=0; —R PSI-TBLASTN checkpoint file [File In] Optional;-n MegaBlast search [T/F]; default=F; -L Location on query sequence[String] Optional; -A Multiple Hits window size, default if zero(blastn/megablast 0, all others 40 [Integer]; default=0; -w Frame shiftpenalty (00F algorithm for blastx) [Integer]; default=0; -t Length ofthe largest intron allowed in tblastn for linking HSPs (0 disableslinking) [Integer]; default=0.

Results of high quality are reached by using the algorithm of Needlemanand Wunsch or Smith and Waterman. Therefore programs based on saidalgorithms are preferred. Advantageously the comparisons of sequencescan be done with the program PileUp (J. Mol. Evolution., 25, 351 (1987),Higgins et al., CABIOS 5, 151 (1989)) or preferably with the programs“Gap” and “Needle”, which are both based on the algorithms of Needlemanand Wunsch (J. Mol. Biol. 48; 443 (1970)), and “BestFit”, which is basedon the algorithm of Smith and Waterman (Adv. Appl. Math. 2; 482 (1981)).“Gap” and “BestFit” are part of the GCG software-package (GeneticsComputer Group, 575 Science Drive, Madison, Wis., USA 53711 (1991);Altschul et al., (Nucleic Acids Res. 25, 3389 (1997)), “Needle” is partof the The European Molecular Biology Open Software Suite (EMBOSS)(Trends in Genetics 16 (6), 276 (2000)). Therefore preferably thecalculations to determine the percentages of sequence identity are donewith the programs “Gap” or “Needle” over the whole range of thesequences. The following standard adjustments for the comparison ofnucleic acid sequences were used for “Needle”: matrix: EDNAFULL,Gap_penalty: 10.0, Extend_penalty: 0.5. The following standardadjustments for the comparison of nucleic acid sequences were used for“Gap”: gap weight: 50, length weight: 3, average match: 10.000, averagemismatch: 0.000.

For example a sequence, which is said to have 80% identity with sequenceSEQ ID NO: 1 at the nucleic acid level is understood as meaning asequence which, upon comparison with the sequence represented by SEQ IDNO: 1 by the above program “Needle” with the above parameter set, has a80% identity. The identity is calculated on the complete length of thequery sequence, for example SEQ ID NO:1.

Isogenic: organisms (e.g., plants), which are genetically identical,except that they may differ by the presence or absence of a heterologousDNA sequence.

Isolated: The term “isolated” as used herein means that a material hasbeen removed by the hand of man and exists apart from its original,native environment and is therefore not a product of nature. An isolatedmaterial or molecule (such as a DNA molecule or enzyme) may exist in apurified form or may exist in a non-native environment such as, forexample, in a transgenic host cell. For example, a naturally occurringpolynucleotide or polypeptide present in a living plant is not isolated,but the same polynucleotide or polypeptide, separated from some or allof the coexisting materials in the natural system, is isolated. Suchpolynucleotides can be part of a vector and/or such polynucleotides orpolypeptides could be part of a composition, and would be isolated inthat such a vector or composition is not part of its originalenvironment. Preferably, the term “isolated” when used in relation to anucleic acid molecule, as in “an isolated nucleic acid sequence” refersto a nucleic acid sequence that is identified and separated from atleast one contaminant nucleic acid molecule with which it is ordinarilyassociated in its natural source. Isolated nucleic acid molecule isnucleic acid molecule present in a form or setting that is differentfrom that in which it is found in nature. In contrast, non-isolatednucleic acid molecules are nucleic acid molecules such as DNA and RNA,which are found in the state they exist in nature. For example, a givenDNA sequence (e.g., a gene) is found on the host cell chromosome inproximity to neighboring genes; RNA sequences, such as a specific mRNAsequence encoding a specific protein, are found in the cell as a mixturewith numerous other mRNAs, which encode a multitude of proteins.However, an isolated nucleic acid sequence comprising for example SEQ IDNO: 1 includes, by way of example, such nucleic acid sequences in cellswhich ordinarily contain SEQ ID NO:1 where the nucleic acid sequence isin a chromosomal or extrachromosomal location different from that ofnatural cells, or is otherwise flanked by a different nucleic acidsequence than that found in nature. The isolated nucleic acid sequencemay be present in single-stranded or double-stranded form. When anisolated nucleic acid sequence is to be utilized to express a protein,the nucleic acid sequence will contain at a minimum at least a portionof the sense or coding strand (i.e., the nucleic acid sequence may besingle-stranded). Alternatively, it may contain both the sense andanti-sense strands (i.e., the nucleic acid sequence may bedouble-stranded).

Minimal Promoter: promoter elements, particularly a TATA element, thatare inactive or that have greatly reduced promoter activity in theabsence of upstream activation. In the presence of a suitabletranscription factor, the minimal promoter functions to permittranscription.

Naturally occurring as used herein means a cell or molecule, for examplea plant cell or nucleic acid molecule that occurs in a plant or organismwhich is not manipulated by man, hence which is for example neithermutated nor genetically engineered by man.

Non-coding: The term “non-coding” refers to sequences of nucleic acidmolecules that do not encode part or all of an expressed protein.Non-coding sequences include but are not limited to introns, enhancers,promoter regions, 3′ untranslated regions, and 5′ untranslated regions.

Nucleic acids and nucleotides: The terms “Nucleic Acids” and“Nucleotides” refer to naturally occurring or synthetic or artificialnucleic acid or nucleotides. The terms “nucleic acids” and “nucleotides”comprise deoxyribonucleotides or ribonucleotides or any nucleotideanalogue and polymers or hybrids thereof in either single- ordouble-stranded, sense or antisense form. Unless otherwise indicated, aparticular nucleic acid sequence also implicitly encompassesconservatively modified variants thereof (e.g., degenerate codonsubstitutions) and complementary sequences, as well as the sequenceexplicitly indicated. The term “nucleic acid” is used interchangeablyherein with “gene”, “cDNA, “mRNA”, “oligonucleotide,” and“polynucleotide”. Nucleotide analogues include nucleotides havingmodifications in the chemical structure of the base, sugar and/orphosphate, including, but not limited to, 5-position pyrimidinemodifications, 8-position purine modifications, modifications atcytosine exocyclic amines, substitution of 5-bromo-uracil, and the like;and 2′-position sugar modifications, including but not limited to,sugar-modified ribonucleotides in which the 2′-OH is replaced by a groupselected from H, OR, R, halo, SH, SR, NH₂, NHR, NR2, or CN. Shorthairpin RNAs (shRNAs) also can comprise non-natural elements such asnon-natural bases, e.g., ionosin and xanthine, non-natural sugars, e.g.,2′-methoxy ribose, or non-natural phosphodiester linkages, e.g.,methylphosphonates, phosphorothioates and peptides.

Nucleic acid sequence: The phrase “nucleic acid sequence” refers to asingle or double-stranded polymer of deoxyribonucleotide orribonucleotide bases read from the 5′- to the 3′-end. It includeschromosomal DNA, self-replicating plasmids, infectious polymers of DNAor RNA and DNA or RNA that performs a primarily structural role.“Nucleic acid sequence” also refers to a consecutive list ofabbreviations, letters, characters or words, which representnucleotides. In one embodiment, a nucleic acid can be a “probe” which isa relatively short nucleic acid, usually less than 100 nucleotides inlength. Often a nucleic acid probe is from about 50 nucleotides inlength to about 10 nucleotides in length. A “target region” of a nucleicacid is a portion of a nucleic acid that is identified to be ofinterest. A “coding region” of a nucleic acid is the portion of thenucleic acid, which is transcribed and translated in a sequence-specificmanner to produce into a particular polypeptide or protein when placedunder the control of appropriate regulatory sequences. The coding regionis said to encode such a polypeptide or protein.

Oligonucleotide: The term “oligonucleotide” refers to an oligomer orpolymer of ribonucleic acid (RNA) or deoxyribonucleic acid (DNA) ormimetics thereof, as well as oligonucleotides havingnon-naturally-occurring portions which function similarly. Such modifiedor substituted oligonucleotides are often preferred over native formsbecause of desirable properties such as, for example, enhanced cellularuptake, enhanced affinity for nucleic acid target and increasedstability in the presence of nucleases. An oligonucleotide preferablyincludes two or more nucleomonomers covalently coupled to each other bylinkages (e.g., phosphodiesters) or substitute linkages.

Overhang: An “overhang” is a relatively short single-stranded nucleotidesequence on the 5′- or 3′-hydroxyl end of a double-strandedoligonucleotide molecule (also referred to as an “extension,”“protruding end,” or “sticky end”).

Overlapping specificity: The term “overlapping specificity” when usedherein related to expression specificity of two or more promoters meansthat the expression regulated by these promoters occur partly in thesame plant tissues, developmental stages or conditions. For example, apromoter expressed in leaves and a promoter expressed in root and leaveshave an overlap in expression specificity in the leaves of a plant.

Plant: is generally understood as meaning any eukaryotic single-ormulti-celled organism or a cell, tissue, organ, part or propagationmaterial (such as seeds or fruit) of same which is capable ofphotosynthesis. Included for the purpose of the invention are all generaand species of higher and lower plants of the Plant Kingdom. Annual,perennial, monocotyledonous and dicotyledonous plants are preferred. Theterm includes the mature plants, seed, shoots and seedlings and theirderived parts, propagation material (such as seeds or microspores),plant organs, tissue, protoplasts, callus and other cultures, forexample cell cultures, and any other type of plant cell grouping to givefunctional or structural units. Mature plants refer to plants at anydesired developmental stage beyond that of the seedling. Seedling refersto a young immature plant at an early developmental stage. Annual,biennial, monocotyledonous and dicotyledonous plants are preferred hostorganisms for the generation of transgenic plants. The expression ofgenes is furthermore advantageous in all ornamental plants, useful orornamental trees, flowers, cut flowers, shrubs or lawns. Plants whichmay be mentioned by way of example but not by limitation areangiosperms, bryophytes such as, for example, Hepaticae (liverworts) andMusci (mosses); Pteridophytes such as ferns, horsetail and club mosses;gymnosperms such as conifers, cycads, ginkgo and Gnetatae; algae such asChlorophyceae, Phaeophpyceae, Rhodophyceae, Myxophyceae, Xanthophyceae,Bacillariophyceae (diatoms), and Euglenophyceae. Preferred are plantswhich are used for food or feed purpose such as the families of theLeguminosae such as pea, alfalfa and soya; Gramineae such as rice,maize, wheat, barley, sorghum, millet, rye, triticale, or oats; thefamily of the Umbelliferae, especially the genus Daucus, very especiallythe species carota (carrot) and Apium, very especially the speciesGraveolens dulce (celery) and many others; the family of the Solanaceae,especially the genus Lycopersicon, very especially the speciesesculentum (tomato) and the genus Solanum, very especially the speciestuberosum (potato) and melongena (egg plant), and many others (such astobacco); and the genus Capsicum, very especially the species annuum(peppers) and many others; the family of the Leguminosae, especially thegenus Glycine, very especially the species max (soybean), alfalfa, pea,lucerne, beans or peanut and many others; and the family of theCruciferae (Brassicacae), especially the genus Brassica, very especiallythe species napus (oil seed rape), campestris (beet), oleracea cv Tastie(cabbage), oleracea cv Snowball Y (cauliflower) and oleracea cv Emperor(broccoli); and of the genus Arabidopsis, very especially the speciesthaliana and many others; the family of the Compositae, especially thegenus Lactuca, very especially the species sativa (lettuce) and manyothers; the family of the Asteraceae such as sunflower, Tagetes, lettuceor Calendula and many other; the family of the Cucurbitaceae such asmelon, pumpkin/squash or zucchini, and linseed. Further preferred arecotton, sugar cane, hemp, flax, chillies, and the various tree, nut andwine species.

Polypeptide: The terms “polypeptide”, “peptide”, “oligopeptide”,“polypeptide”, “gene product”, “expression product” and “protein” areused interchangeably herein to refer to a polymer or oligomer ofconsecutive amino acid residues.

Pre-protein: Protein, which is normally targeted to a cellularorganelle, such as a chloroplast, and still comprising its transitpeptide.

Primary transcript: The term “primary transcript” as used herein refersto a premature RNA transcript of a gene. A “primary transcript” forexample still comprises introns and/or is not yet comprising a polyAtail or a cap structure and/or is missing other modifications necessaryfor its correct function as transcript such as for example trimming orediting.

Promoter: The terms “promoter”, or “promoter sequence” are equivalentsand as used herein, refer to a DNA sequence which when ligated to anucleotide sequence of interest is capable of controlling thetranscription of the nucleotide sequence of interest into RNA. Suchpromoters can for example be found in the following public databaseshttp://www.grassius.org/grasspromdb.html,http://mendel.cs.rhul.ac.uk/mendel.php?topic=plantprom,http://ppdb.gene.nagoya-u.ac.jp/cgibin/index.cgi. Promoters listed theremay be addressed with the methods of the invention and are herewithincluded by reference. A promoter is located 5′ (i.e., upstream),proximal to the transcriptional start site of a nucleotide sequence ofinterest whose transcription into mRNA it controls, and provides a sitefor specific binding by RNA polymerase and other transcription factorsfor initiation of transcription. Said promoter comprises for example theat least 10 kb, for example 5 kb or 2 kb proximal to the transcriptionstart site. It may also comprise the at least 1500 bp proximal to thetranscriptional start site, preferably the at least 1000 bp, morepreferably the at least 500 bp, even more preferably the at least 400bp, the at least 300 bp, the at least 200 bp or the at least 100 bp. Ina further preferred embodiment, the promoter comprises the at least 50bp proximal to the transcription start site, for example, at least 25bp. The promoter does not comprise exon and/or intron regions or 5′untranslated regions. The promoter may for example be heterologous orhomologous to the respective plant. A polynucleotide sequence is“heterologous to” an organism or a second polynucleotide sequence if itoriginates from a foreign species, or, if from the same species, ismodified from its original form. For example, a promoter operably linkedto a heterologous coding sequence refers to a coding sequence from aspecies different from that from which the promoter was derived, or, iffrom the same species, a coding sequence which is not naturallyassociated with the promoter (e.g. a genetically engineered codingsequence or an allele from a different ecotype or variety). Suitablepromoters can be derived from genes of the host cells where expressionshould occur or from pathogens for this host cells (e.g., plants orplant pathogens like plant viruses). A plant specific promoter is apromoter suitable for regulating expression in a plant. It may bederived from a plant but also from plant pathogens or it might be asynthetic promoter designed by man. If a promoter is an induciblepromoter, then the rate of transcription increases in response to aninducing agent. Also, the promoter may be regulated in a tissue-specificor tissue preferred manner such that it is only or predominantly activein transcribing the associated coding region in a specific tissuetype(s) such as leaves, roots or meristem. The term “tissue specific” asit applies to a promoter refers to a promoter that is capable ofdirecting selective expression of a nucleotide sequence of interest to aspecific type of tissue (e.g., petals) in the relative absence ofexpression of the same nucleotide sequence of interest in a differenttype of tissue (e.g., roots). Tissue specificity of a promoter may beevaluated by, for example, operably linking a reporter gene to thepromoter sequence to generate a reporter construct, introducing thereporter construct into the genome of a plant such that the reporterconstruct is integrated into every tissue of the resulting transgenicplant, and detecting the expression of the reporter gene (e.g.,detecting mRNA, protein, or the activity of a protein encoded by thereporter gene) in different tissues of the transgenic plant. Thedetection of a greater level of expression of the reporter gene in oneor more tissues relative to the level of expression of the reporter genein other tissues shows that the promoter is specific for the tissues inwhich greater levels of expression are detected. The term “cell typespecific” as applied to a promoter refers to a promoter, which iscapable of directing selective expression of a nucleotide sequence ofinterest in a specific type of cell in the relative absence ofexpression of the same nucleotide sequence of interest in a differenttype of cell within the same tissue. The term “cell type specific” whenapplied to a promoter also means a promoter capable of promotingselective expression of a nucleotide sequence of interest in a regionwithin a single tissue. Cell type specificity of a promoter may beassessed using methods well known in the art, e.g., GUS activitystaining, GFP protein or immunohistochemical staining. The term“constitutive” when made in reference to a promoter or the expressionderived from a promoter means that the promoter is capable of directingtranscription of an operably linked nucleic acid molecule in the absenceof a stimulus (e.g., heat shock, chemicals, light, etc.) in the majorityof plant tissues and cells throughout substantially the entire lifespanof a plant or part of a plant. Typically, constitutive promoters arecapable of directing expression of a transgene in substantially any celland any tissue.

Promoter specificity: The term “specificity” when referring to apromoter means the pattern of expression conferred by the respectivepromoter. The specificity describes the tissues and/or developmentalstatus of a plant or part thereof, in which the promoter is conferringexpression of the nucleic acid molecule under the control of therespective promoter. Specificity of a promoter may also comprise theenvironmental conditions, under which the promoter may be activated ordown-regulated such as induction or repression by biological orenvironmental stresses such as cold, drought, wounding or infection.

Purified: As used herein, the term “purified” refers to molecules,either nucleic or amino acid sequences that are removed from theirnatural environment, isolated or separated. “Substantially purified”molecules are at least 60% free, preferably at least 75% free, and morepreferably at least 90% free from other components with which they arenaturally associated. A purified nucleic acid sequence may be anisolated nucleic acid sequence.

Recombinant: The term “recombinant” with respect to nucleic acidmolecules refers to nucleic acid molecules produced by recombinant DNAtechniques. Recombinant nucleic acid molecules as such do not exist innature but are modified, changed, mutated or otherwise manipulated byman. A “recombinant nucleic acid molecule” is a non-naturally occurringnucleic acid molecule that differs in sequence from a naturallyoccurring nucleic acid molecule by at least one nucleic acid. The term“recombinant nucleic acid molecule” may also comprise a “recombinantconstruct” which comprises, preferably operably linked, a sequence ofnucleic acid molecules, which are not naturally occurring in that orderwherein each of the nucleic acid molecules may or may not be arecombinant nucleic acid molecule. Preferred methods for producing saidrecombinant nucleic acid molecule may comprise cloning techniques,directed or non-directed mutagenesis, synthesis or recombinationtechniques.

Sense: The term “sense” is understood to mean a nucleic acid moleculehaving a sequence which is complementary or identical to a targetsequence, for example a sequence which binds to a protein transcriptionfactor and which is involved in the expression of a given gene.According to a preferred embodiment, the nucleic acid molecule comprisesa gene of interest and elements allowing the expression of the said geneof interest.

Starting sequence: The term “starting sequence” when used herein definesthe sequence of a promoter of a defined specificity which is used as areference sequence for analysis of the presence of motives. The startingsequence is referred to for the definition of the degree of identity tothe sequences of the promoters of the invention. The starting sequencecould be any wild-type, naturally occurring promoter sequence or anyartificial promoter sequence. The sequence of a synthetic promotersequence produced with the method of the invention may also be used as astarting sequence.

Substantially complementary: In its broadest sense, the term“substantially complementary”, when used herein with respect to anucleotide sequence in relation to a reference or target nucleotidesequence, means a nucleotide sequence having a percentage of identitybetween the substantially complementary nucleotide sequence and theexact complementary sequence of said reference or target nucleotidesequence of at least 60%, more desirably at least 70%, more desirably atleast 80% or 85%, preferably at least 90%, more preferably at least 93%,still more preferably at least 95% or 96%, yet still more preferably atleast 97% or 98%, yet still more preferably at least 99% or mostpreferably 100% (the later being equivalent to the term “identical”inthis context). Preferably identity is assessed over a length of at least19 nucleotides, preferably at least 50 nucleotides, more preferably theentire length of the nucleic acid sequence to said reference sequence(if not specified otherwise below). Sequence comparisons are carried outusing default GAP analysis with the University of Wisconsin GCG, SEQWEBapplication of GAP, based on the algorithm of Needleman and Wunsch(Needleman and Wunsch (1970) J. Mol. Biol. 48: 443-453; as definedabove). A nucleotide sequence “substantially complementary” to areference nucleotide sequence hybridizes to the reference nucleotidesequence under low stringency conditions, preferably medium stringencyconditions, most preferably high stringency conditions (as definedabove).

Transgene: The term “transgene” as used herein refers to any nucleicacid sequence, which is introduced into the genome of a cell byexperimental manipulations. A transgene may be an “endogenous DNAsequence,” or a “heterologous DNA sequence” (i.e., “foreign DNA”). Theterm “endogenous DNA sequence” refers to a nucleotide sequence, which isnaturally found in the cell into which it is introduced so long as itdoes not contain some modification (e.g., a point mutation, the presenceof a selectable marker gene, etc.) relative to the naturally-occurringsequence.

Transgenic: The term transgenic when referring to an organism meanstransformed, preferably stably transformed, with a recombinant DNAmolecule that preferably comprises a suitable promoter operativelylinked to a DNA sequence of interest.

Vector: As used herein, the term “vector” refers to a nucleic acidmolecule capable of transporting another nucleic acid molecule to whichit has been linked. One type of vector is a genomic integrated vector,or “integrated vector”, which can become integrated into the chromosomalDNA of the host cell. Another type of vector is an episomal vector,i.e., a nucleic acid molecule capable of extra-chromosomal replication.Vectors capable of directing the expression of genes to which they areoperatively linked are referred to herein as “expression vectors”. Inthe present specification, “plasmid” and “vector” are usedinterchangeably unless otherwise clear from the context. Expressionvectors designed to produce RNAs as described herein in vitro or in vivomay contain sequences recognized by any RNA polymerase, includingmitochondrial RNA polymerase, RNA pol I, RNA pol II, and RNA pol III.These vectors can be used to transcribe the desired RNA molecule in thecell according to this invention. A plant transformation vector is to beunderstood as a vector suitable in the process of plant transformation.

Wild-type: The term “wild-type”, “natural” or “natural origin” meanswith respect to an organism, polypeptide, or nucleic acid sequence, thatsaid organism is naturally occurring or available in at least onenaturally occurring organism which is not changed, mutated, or otherwisemanipulated by man.

EXAMPLES Chemicals and Common Methods

Unless indicated otherwise, cloning procedures carried out for thepurposes of the present invention including restriction digest, agarosegel electrophoresis, purification of nucleic acids, Ligation of nucleicacids, transformation, selection and cultivation of bacterial cells wereperformed as described (Sambrook et al., 1989). Sequence analyses ofrecombinant DNA were performed with a laser fluorescence DNA sequencer(Applied Biosystems, Foster City, Calif., USA) using the Sangertechnology (Sanger et al., 1977). Unless described otherwise, chemicalsand reagents were obtained from Sigma Aldrich (Sigma Aldrich, St. Louis,USA), from Promega (Madison, Wis., USA), Duchefa (Haarlem, TheNetherlands) or Invitrogen (Carlsbad, Calif., USA). Restrictionendonucleases were from New England Biolabs (Ipswich, Mass., USA) orRoche Diagnostics GmbH (Penzberg, Germany). Oligonucleotides weresynthesized by Eurofins MWG Operon (Ebersberg, Germany).

Example 1 1.1 Directed Permutation of the Promoter Sequence

Using publicly available data, two promoters showing seed specificexpression in plants were selected for analyzing the effects of sequencepermutation in periodic intervals throughout the full length of thepromoter DNA sequence (WO2009016202, WO2009133145). The wildtype orstarting sequences of the Phaseolus vulgaris p-PvARC5 (SEQ ID NO 1)(with the prefix p-denoting promoter) and the Vicia faba p-VfSBP (SEQ IDNO 3) promoters were analyzed and annotated for the occurrence ofmotives, boxes, cis-regulatory elements using e.g. the GEMS LauncherSoftware (www.genomatix.de) with default parameters (Core similarity0.75, matrix similarity 0.75)

The “core sequence” of a matrix is defined as the usually 4 consecutivehighest conserved positions of the matrix.

The core similarity is calculated as described here and in the papersrelated to MatInspector (Cartharius K, et al. (2005) Bioinformatics 21;Cartharius K (2005), DNA Press; Quandt K, et al (1995) Nucleic AcidsRes. 23.

The maximum core similarity of 1.0 is only reached when the highestconserved bases of a matrix match exactly in the sequence. Moreimportant than the core similarity is the matrix similarity which takesinto account all bases over the whole matrix length. The matrixsimilarity is calculated as described here and in the MatInspectorpaper. A perfect match to the matrix gets a score of 1.00 (each sequenceposition corresponds to the highest conserved nucleotide at thatposition in the matrix), a “good” match to the matrix has a similarityof >0.80.

Mismatches in highly conserved positions of the matrix decrease thematrix similarity more than mismatches in less conserved regions.

Opt. gives the Optimized matrix threshold: This matrix similarity is theoptimized value defined in a way that a minimum number of matches isfound in non-regulatory test sequences (i.e. with this matrix similaritythe number of false positive matches is minimized). This matrixsimilarity is used when the user checks “Optimized” as the matrixsimilarity threshold for MatInspector. In the following, the DNAsequences of the promoters were permutated according to the method ofthe invention to yield p-PvArc5_perm (SEQ ID NO 2) and p-VfSBP_perm (SEQID NO 4). In case of the p-PvArc5 promoter 6.6% of the motives notassociated with seed specific/preferential expression and transcriptioninitiation have been altered, in case of the p-VfSBP 7.8%. DNApermutation was conducted in a way to not affect cis regulatory elementswhich have been associated previously with seed specific gene expressionor initiation of transcription and permutations were distributedperiodically over the full promoter DNA sequence with less than 46nucleotides between permutated nucleotide positions and within a stretchof 5 nucleotides having at least one nucleotide permutated. Permutationswere carried out with the aim to keep most of the cis regulatoryelements, boxes, motives present in the native promoter and to avoidcreating new putative cis regulatory elements, boxes, motives.

The list of motives, boxes, cis regulatory elements in the PvARC5promoters before and after the permutation are shown in Table 1 and 2.

The list of motives, boxes, cis regulatory elements in the VfSBPpromoters before and after the permutation are shown in Table 3 and 4.

Empty lines resemble motives, boxes, cis regulatory elements not foundin one sequence but present in the corresponding sequence, hence,motives, boxes, cis regulatory elements that were deleted from thestarting sequence or that were introduced into the permutated sequence.

TABLE 1 Boxes and Motifs identified in the starting sequence of thePvARC5 promoter PvARC5 promotor Position Further Family Position CoreMatrix Family Information Matrix Opt. from-to Strand sim. sim. P$PSREPollen-specific P$GAAA.01 0.83  9-25 (+) 1 0.862 regulatory elementsP$IDDF ID domain factors P$ID1.01 0.92 36-48 (−) 1 0.922 P$MYBL MYB-likeproteins P$ATMYB77.01 0.87 47-63 (+) 1 0.887 P$RAV5 5′-part of bipartiteP$RAV1-5.01 0.96 48-58 (+) 1 0.96 RAV1 binding site P$MYBL MYB-likeproteins P$GAMYB.01 0.91 52-68 (−) 1 0.932 O$INRE Core promoterO$DINR.01 0.94 75-85 (+) 0.97 0.988 initiator elements P$AHBPArabidopsis P$WUS.01 0.94 84-94 (−) 1 0.963 homeobox protein P$MIIG MYBIIG-type P$PALBOXL.01 0.80  87-101 (+) 0.84 0.806 binding sites P$NCS1Nodulin consensus P$NCS1.01 0.85 106-116 (+) 1 0.99 sequence 1 P$GAPBGAP-Box (light P$GAP.01 0.88 108-122 (+) 0.81 0.884 response elements)P$AHBP Arabidopsis P$WUS.01 0.94 110-120 (−) 1 0.963 homeobox proteinP$TEFB TEF-box P$TEF1.01 0.76 111-131 (−) 0.96 0.761 P$CCAF Circadiancontrol P$CCA1.01 0.85 113-127 (+) 0.77 0.856 factors P$IBOX Plant I-BoxP$GATA.01 0.93 121-137 (−) 1 0.964 sites P$GAGA GAGA elementsP$GAGABP.01 0.75 125-149 (−) 0.75 0.768 P$NCS2 Nodulin consensusP$NCS2.01 0.79 126-140 (+) 1 0.799 sequence 2 P$AHBP ArabidopsisP$HAHB4.01 0.87 144-154 (−) 1 0.923 homeobox protein P$AHBP ArabidopsisP$BLR.01 0.90 147-157 (−) 1 0.928 homeobox protein O$VTBP Vertebrate TA-O$LTATA.01 0.82 151-167 (+) 1 0.839 TA binding protein factor P$NCS1Nodulin consensus P$NCS1.01 0.85 164-174 (−) 1 0.898 sequence 1 P$L1BXL1 box, motif P$ATML1.01 0.82 175-191 (+) 0.75 0.872 for L1 layer-specific expression P$AHBP Arabidopsis P$ATHB5.01 0.89 177-187 (+) 0.830.902 homeobox protein P$AHBP Arabidopsis P$ATHB5.01 0.89 177-187 (−) 11 homeobox protein O$VTBP Vertebrate TA- O$ATATA.01 0.78 184-200 (+)0.75 0.797 TA binding protein factor P$TELO Telo box (plant P$ATPURA.010.85 186-200 (−) 0.75 0.857 interstitial telomere motifs) P$NCS2 Nodulinconsensus P$NCS2.01 0.79 213-227 (−) 1 0.826 sequence 2 P$SUCB Sucrosebox P$SUCROSE.01 0.81 233-251 (+) 0.75 0.824 P$MYBL MYB-like proteinsP$MYBPH3.02 0.76 238-254 (+) 0.82 0.798 P$NCS1 Nodulin consensusP$NCS1.01 0.85 261-271 (−) 1 0.851 sequence 1 P$MYBL MYB-like proteinsP$MYBPH3.02 0.76 264-280 (+) 1 0.774 O$VTBP Vertebrate TA- O$ATATA.010.78 267-283 (+) 1 0.872 TA binding protein factor P$SPF1 Sweet potatoP$SP8BF.01 0.87 298-310 (−) 1 0.872 DNA-binding factor with two WRKY-domains P$BRRE Brassinosteroid P$BZR1.01 0.95 303-319 (−) 1 0.953 (BR)response element P$L1BX L1 box, motif P$ATML1.02 0.76 319-335 (+) 0.890.762 for L1 layer- specific expression P$GBOX Plant G-box/C- P$TGA1.010.90 327-347 (−) 1 0.909 box bZIP proteins P$GTBX GT-box elementsP$GT1.01 0.85 337-353 (+) 1 0.854 P$IBOX Plant I-Box P$GATA.01 0.93337-353 (−) 1 0.935 sites P$OPAQ Opaque-2 like P$O2.01 0.87 351-367 (−)1 0.919 transcriptional activators P$GTBX GT-box elements P$S1F.01 0.79362-378 (−) 0.75 0.797 P$AHBP Arabidopsis P$ATHB9.01 0.77 367-377 (−) 10.788 homeobox protein P$AHBP Arabidopsis P$HAHB4.01 0.87 367-377 (+) 10.926 homeobox protein P$GTBX GT-box elements P$SBF1.01 0.87 367-383 (+)1 0.894 P$L1BX L1 box, motif P$ATML1.01 0.82 369-385 (+) 1 0.827 for L1layer- specific expression P$AHBP Arabidopsis P$WUS.01 0.94 371-381 (−)1 1 homeobox protein O$VTBP Vertebrate TA- O$LTATA.01 0.82 396-412 (−) 10.857 TA binding protein factor P$LREM Light responsive P$RAP22.01 0.85397-407 (+) 1 0.921 element motif, not modulated by different lightqualities P$AHBP Arabidopsis P$HAHB4.01 0.87 401-411 (−) 1 0.916homeobox protein P$MYBL MYB-like proteins P$WER.01 0.87 403-419 (−) 10.9 P$MYBS MYB proteins P$OSMYBS.01 0.82 416-432 (+) 0.75 0.837 withsingle DNA binding repeat P$TELO Telo box (plant P$ATPURA.01 0.85440-454 (−) 0.75 0.854 interstitial telomere motifs) P$SUCB Sucrose boxP$SUCROSE.01 0.81 461-479 (−) 0.75 0.826 P$AHBP Arabidopsis P$HAHB4.010.87 468-478 (+) 1 0.892 homeobox protein O$VTBP Vertebrate TA-O$VTATA.01 0.90 473-489 (−) 1 0.913 TA binding protein factor O$PTBPPlant TATA O$PTATA.01 0.88 476-490 (−) 1 0.889 binding protein factorP$PSRE Pollen-specific P$GAAA.01 0.83 482-498 (+) 1 0.831 regulatoryelements P$HMGF High mobility P$HMG_IY.01 0.89 499-513 (−) 1 0.91 groupfactors P$SUCB Sucrose box P$SUCROSE.01 0.81 499-517 (−) 1 0.878 P$GTBXGT-box elements P$SBF1.01 0.87 509-525 (−) 1 0.885 P$GARP Myb-relatedP$ARR10.01 0.97 540-548 (+) 1 0.976 DNA binding proteins (Golden2, ARR,Psr) P$AHBP Arabidopsis P$ATHB9.01 0.77 558-568 (−) 1 0.775 homeoboxprotein P$L1BX L1 box, motif P$PDF2.01 0.85 558-574 (−) 1 0.865 for L1layer- specific expression P$NCS1 Nodulin consensus P$NCS1.01 0.85558-568 (−) 0.88 0.927 sequence 1 P$EINL Ethylen insensitive P$TEIL.010.92 572-580 (+) 1 0.921 3 like factors P$AHBP Arabidopsis P$ATHB5.010.89 583-593 (+) 0.94 0.977 homeobox protein P$AHBP ArabidopsisP$ATHB5.01 0.89 583-593 (−) 0.83 0.94 homeobox protein P$L1BX L1 box,motif P$HDG9.01 0.77 607-623 (+) 1 0.772 for L1 layer- specificexpression P$IBOX Plant I-Box P$IBOX.01 0.81 610-626 (+) 0.75 0.824sites P$MYBS MYB proteins P$MYBST1.01 0.90 613-629 (−) 1 0.953 withsingle DNA binding repeat P$IBOX Plant I-Box P$GATA.01 0.93 616-632 (+)1 0.942 sites P$TEFB TEF-box P$TEF1.01 0.76 616-636 (+) 0.96 0.778P$MYBS MYB proteins P$TAMYB80.01 0.83 625-641 (−) 1 0.859 with singleDNA binding repeat O$PTBP Plant TATA O$PTATA.02 0.90 631-645 (+) 1 0.927binding protein factor O$PTBP Plant TATA O$PTATA.02 0.90 632-646 (−) 10.929 binding protein factor O$VTBP Vertebrate TA- O$ATATA.01 0.78646-662 (−) 0.75 0.825 TA binding protein factor P$L1BX L1 box, motifP$HDG9.01 0.77 648-664 (+) 1 0.791 for L1 layer- specific expressionP$HMGF High mobility P$HMG_IY.01 0.89 649-663 (−) 1 0.902 group factorsP$DOFF DNA binding P$PBF.01 0.97 654-670 (+) 1 0.979 with one finger(DOF) P$LREM Light responsive P$RAP22.01 0.85 682-692 (−) 1 0.975element motif, not modulated by different light qualities P$TEFB TEF-boxP$TEF1.01 0.76 696-716 (−) 0.84 0.78 P$MYBL MYB-like proteins P$CARE.010.83 699-715 (−) 1 0.88 P$LEGB Legumin Box P$RY.01 0.87 704-730 (+) 10.94 family P$GBOX Plant G-box/C- P$BZIP910.01 0.77 716-736 (−) 0.750.856 box bZIP proteins P$GBOX Plant G-box/C- P$ROM.01 0.85 717-737 (+)1 1 box bZIP proteins P$ABRE ABA response P$ABF1.03 0.82 719-735 (−)0.75 0.857 elements P$GBOX Plant G-box/C- P$BZIP910.02 0.84 722-742 (−)0.75 0.862 box bZIP proteins P$MYCL Myc-like basic P$MYCRS.01 0.93739-757 (−) 0.86 0.943 helix-loop-helix binding factors P$OPAQ Opaque-2like P$GCN4.01 0.81 745-761 (−) 1 0.85 transcriptional activators P$AREFAuxin response P$ARE.01 0.93 747-759 (+) 1 0.941 element P$TEFB TEF-boxP$TEF1.01 0.76 783-803 (−) 0.84 0.78 P$MYBL MYB-like proteins P$CARE.010.83 786-802 (−) 1 0.876 P$LEGB Legumin Box P$RY.01 0.87 788-814 (−) 10.929 family P$LEGB Legumin Box P$RY.01 0.87 791-817 (+) 1 0.984 familyP$ROOT Root hair- P$RHE.01 0.77 796-820 (+) 1 0.812 specific cis-elements in angiosperms P$GBOX Plant G-box/C- P$CPRF.01 0.95 803-823 (−)1 0.989 box bZIP proteins P$GBOX Plant G-box/C- P$CPRF.01 0.95 804-824(+) 1 0.98 box bZIP proteins P$MYCL Myc-like basic P$MYCRS.01 0.93804-822 (−) 1 0.956 helix-loop-helix binding factors P$ABRE ABA responseP$ABRE.01 0.82 805-821 (+) 1 0.874 elements P$MYCL Myc-like basicP$PIF3.01 0.82 805-823 (+) 1 0.914 helix-loop-helix binding factorsP$OPAQ Opaque-2 like P$RITA1.01 0.95 805-821 (−) 1 0.992 transcriptionalactivators P$ABRE ABA response P$ABF1.03 0.82 806-822 (−) 1 0.977elements P$OPAQ Opaque-2 like P$RITA1.01 0.95 806-822 (+) 1 0.973transcriptional activators P$OCSE Enhancer element P$OCSTF.01 0.73809-829 (−) 0.85 0.747 first identified in the promoter of the octopinesynthase gene (OCS) of the Agrobacterium tumefaciens T- DNA P$GTBXGT-box elements P$S1F.01 0.79 823-839 (−) 1 0.794 P$LFYB LFY bindingP$LFY.01 0.93 839-851 (−) 0.91 0.935 site P$LEGB Legumin Box P$RY.010.87 840-866 (−) 1 0.948 family P$LEGB Legumin Box P$RY.01 0.87 843-869(+) 1 0.966 family P$LEGB Legumin Box P$IDE1.01 0.77 847-873 (+) 1 0.779family P$GBOX Plant G-box/C- P$BZIP910.01 0.77 855-875 (−) 0.75 0.856box bZIP proteins P$GBOX Plant G-box/C- P$ROM.01 0.85 856-876 (+) 1 1box bZIP proteins P$ABRE ABA response P$ABF1.03 0.82 858-874 (−) 0.750.857 elements P$GBOX Plant G-box/C- P$BZIP910.02 0.84 861-881 (−) 0.750.862 box bZIP proteins P$SALT Salt/drought P$ALFIN1.02 0.95 871-885 (−)1 0.963 responsive elements P$LEGB Legumin Box P$RY.01 0.87 895-921 (+)1 0.927 family P$GBOX Plant G-box/C- P$BZIP910.01 0.77 907-927 (−) 0.750.856 box bZIP proteins P$GBOX Plant G-box/C- P$ROM.01 0.85 908-928 (+)1 0.938 box bZIP proteins P$ABRE ABA response P$ABF1.03 0.82 910-926 (−)0.75 0.857 elements P$GBOX Plant G-box/C- P$BZIP910.02 0.84 913-933 (−)0.75 0.871 box bZIP proteins P$MADS MADS box P$SQUA.01 0.90 960-980 (−)1 0.908 proteins P$L1BX L1 box, motif P$PDF2.01 0.85 963-979 (+) 1 0.856for L1 layer- specific expression P$LREM Light responsive P$RAP22.010.85 972-982 (+) 1 0.858 element motif, not modulated by different lightqualities O$PTBP Plant TATA O$PTATA.01 0.88 974-988 (−) 0.83 0.886binding protein factor O$VTBP Vertebrate TATA O$ATATA.01 0.78 974-990(+) 0.75 0.83 binding protein factor O$VTBP Vertebrate TATA O$MTATA.010.84 976-992 (+) 1 0.843 binding protein factor P$MYBL MYB-like proteinsP$MYBPH3.02 0.76 983-999 (−) 1 0.787 P$SUCB Sucrose box P$SUCROSE.010.81  984-1002 (−) 1 0.818 P$AHBP Arabidopsis P$ATHB1.01 0.90  991-1001(+) 1 0.989 homeobox protein P$AHBP Arabidopsis P$HAHB4.01 0.87 991-1001 (−) 1 0.943 homeobox protein P$HMGF High mobility P$HMG_IY.010.89  992-1006 (+) 1 0.913 group factors P$SPF1 Sweet potato P$SP8BF.010.87 1003-1015 (+) 1 0.881 DNA-binding factor with two WRKY- domainsP$OCSE Enhancer element P$OCSTF.01 0.73 1004-1024 (+) 1 0.776 firstidentified in the promoter of the octopine synthase gene (OCS) of theAgrobacterium tumefaciens T- DNA P$GBOX Plant G-box/C- P$UPRE.01 0.861009-1029 (−) 1 0.974 box bZIP proteins P$GBOX Plant G-box/C- P$TGA1.010.90 1010-1030 (+) 1 0.991 box bZIP proteins P$ABRE ABA responseP$ABF1.03 0.82 1011-1027 (+) 1 0.828 elements P$OPAQ Opaque-2 likeP$O2.01 0.87 1011-1027 (−) 1 0.99 transcriptional activators P$OPAQOpaque-2 like P$O2_GCN4.01 0.81 1012-1028 (+) 0.95 0.893 transcriptionalactivators P$ROOT Root hair- P$RHE.01 0.77 1013-1037 (−) 1 0.771specific cis- elements in angiosperms P$LEGB Legumin Box P$LEGB.01 0.651025-1051 (+) 1 0.656 family P$AHBP Arabidopsis P$ATHB5.01 0.891042-1052 (+) 0.83 0.902 homeobox protein P$AHBP Arabidopsis P$ATHB5.010.89 1042-1052 (−) 1 1 homeobox protein P$GTBX GT-box elements P$SBF1.010.87 1045-1061 (+) 1 0.904 O$INRE Core promoter O$DINR.01 0.94 1070-1080(−) 0.97 0.949 initiator elements P$CCAF Circadian control P$CCA1.010.85 1093-1107 (−) 1 0.952 factors P$L1BX L1 box, motif P$ATML1.01 0.821098-1114 (−) 0.75 0.843 for L1 layer- specific expression P$CARMCA-rich motif P$CARICH.01 0.78 1102-1120 (−) 1 0.791 P$MADS MADS boxP$SQUA.01 0.90 1108-1128 (−) 1 0.928 proteins O$PTBP Plant TATAO$PTATA.01 0.88 1111-1125 (+) 1 0.961 binding protein factor O$VTBPVertebrate TATA O$VTATA.01 0.90 1112-1128 (+) 1 0.968 binding proteinfactor P$LEGB Legumin Box P$RY.01 0.87 1130-1156 (−) 1 0.922 familyP$AHBP Arabidopsis P$WUS.01 0.94 1135-1145 (+) 1 1 homeobox proteinP$LEGB Legumin Box P$RY.01 0.87 1138-1164 (−) 1 0.914 family P$ROOT Roothair- P$RHE.01 0.77 1138-1162 (+) 0.75 0.794 specific cis- elements inangiosperms P$L1BX L1 box, motif P$ATML1.01 0.82 1141-1157 (+) 0.750.833 for L1 layer- specific expression

TABLE 2 Boxes and Motifs identified in the permutated sequence of thePvARC5 promoter. Preferably associated boxes are annotated in line 38,43, 116, 121, 124, 128, 129, 137, 138, 143, 145, 146, 147, 151, 152,153, 156, 162, 165, 175, 184, 186, 188, 203 and 205 of tables 1 and 2.Essential boxes are annotated in line 83, 111, 112, 172 and 201 oftables 1 and 2. PvARC5 promotor permutated Further Family Position CoreMatrix Family Information Matrix Opt. from-to Strand sim. sim. P$PSREPollen-specific P$GAAA.01 0.83  9-25 (+) 1 0.862 regulatory elementsP$IDDF ID domain factors P$ID1.01 0.92 36-48 (−) 1 0.922 P$MYBL MYB-likeproteins P$ATMYB77.01 0.87 47-63 (+) 1 0.887 P$RAV5 5′-part of bipartiteP$RAV1-5.01 0.96 48-58 (+) 1 0.96 RAV1 binding site P$MYBL MYB-likeproteins P$GAMYB.01 0.91 52-68 (−) 1 0.932 P$STKM Storekeeper P$STK.010.85 58-72 (+) 0.79 0.894 motif P$MYBL MYB-like proteins P$MYBPH3.010.80 59-75 (+) 0.75 0.806 P$L1BX L1 box, motif P$ATML1.02 0.76 62-78 (+)0.89 0.791 for L1 layer- specific expression O$INRE Core promoterO$DINR.01 0.94 75-85 (+) 0.97 0.988 initiator elements P$AHBPArabidopsis P$WUS.01 0.94 84-94 (−) 1 0.963 homeobox protein P$MIIG MYBIIG-type P$PALBOXL.01 0.80  87-101 (+) 0.84 0.806 binding sites P$NCS1Nodulin consensus P$NCS1.01 0.85 106-116 (+) 1 0.99 sequence 1 P$GAPBGAP-Box (light P$GAP.01 0.88 108-122 (+) 0.81 0.884 response elements)P$AHBP Arabidopsis P$WUS.01 0.94 110-120 (−) 1 0.963 homeobox proteinP$IBOX Plant I-Box P$GATA.01 0.93 121-137 (−) 1 0.939 sites P$GAGA GAGAelements P$GAGABP.01 0.75 125-149 (−) 0.75 0.764 P$NCS2 Nodulinconsensus P$NCS2.01 0.79 126-140 (+) 1 0.799 sequence 2 P$AHBPArabidopsis P$HAHB4.01 0.87 144-154 (−) 1 0.923 homeobox protein P$AHBPArabidopsis P$BLR.01 0.90 147-157 (−) 1 0.928 homeobox protein O$VTBPVertebrate TA- O$ATATA.01 0.78 149-165 (+) 1 0.78 TA binding proteinfactor O$VTBP Vertebrate TA- O$LTATA.01 0.82 151-167 (+) 1 0.825 TAbinding protein factor P$NCS1 Nodulin consensus P$NCS1.01 0.85 164-174(−) 1 0.898 sequence 1 P$L1BX L1 box, motif P$ATML1.01 0.82 175-191 (+)0.75 0.872 for L1 layer- specific expression P$AHBP ArabidopsisP$ATHB5.01 0.89 177-187 (+) 0.83 0.902 homeobox protein P$AHBPArabidopsis P$ATHB5.01 0.89 177-187 (−) 1 1 homeobox protein O$VTBPVertebrate TA- O$ATATA.01 0.78 184-200 (+) 0.75 0.797 TA binding proteinfactor P$TELO Telo box (plant P$ATPURA.01 0.85 186-200 (−) 0.75 0.857interstitial telomere motifs) P$PSRE Pollen-specific P$GAAA.01 0.83188-204 (−) 1 0.843 regulatory elements P$NCS2 Nodulin consensusP$NCS2.01 0.79 213-227 (−) 1 0.826 sequence 2 P$IBOX Plant I-BoxP$GATA.01 0.93 221-237 (+) 1 1 sites P$SUCB Sucrose box P$SUCROSE.010.81 233-251 (+) 0.75 0.824 P$MYBL MYB-like proteins P$MYBPH3.02 0.76238-254 (+) 0.82 0.798 P$SUCB Sucrose box P$SUCROSE.01 0.81 243-261 (−)0.75 0.824 P$AHBP Arabidopsis P$HAHB4.01 0.87 250-260 (+) 1 0.892homeobox protein P$PSRE Pollen-specific P$GAAA.01 0.83 257-273 (+) 10.881 regulatory elements P$NCS1 Nodulin consensus P$NCS1.01 0.85261-271 (−) 1 0.851 sequence 1 P$MYBL MYB-like proteins P$MYBPH3.02 0.76264-280 (+) 1 0.774 O$VTBP Vertebrate TA- O$ATATA.01 0.78 267-283 (+) 10.872 TA binding protein factor P$MYBL MYB-like proteins P$GAMYB.01 0.91289-305 (−) 1 0.919 P$SPF1 Sweet potato P$SP8BF.01 0.87 298-310 (−) 10.872 DNA-binding factor with two WRKY- domains P$BRRE BrassinosteroidP$BZR1.01 0.95 303-319 (−) 1 0.953 (BR) response element P$GBOX PlantG-box/C- P$TGA1.01 0.90 327-347 (−) 1 0.909 box bZIP proteins P$GTBXGT-box elements P$GT1.01 0.85 337-353 (+) 1 0.854 P$IBOX Plant I-BoxP$GATA.01 0.93 337-353 (−) 1 0.935 sites P$PSRE Pollen-specificP$GAAA.01 0.83 342-358 (−) 1 0.896 regulatory elements P$AHBPArabidopsis P$ATHB9.01 0.77 343-353 (−) 1 0.869 homeobox protein P$NCS1Nodulin consensus P$NCS1.01 0.85 343-353 (−) 0.88 0.915 sequence 1P$GTBX GT-box elements P$S1F.01 0.79 344-360 (−) 0.75 0.827 O$INRE Corepromoter O$DINR.01 0.94 345-355 (+) 0.97 0.945 initiator elements P$OPAQOpaque-2 like P$O2.01 0.87 351-367 (−) 1 0.919 transcriptionalactivators P$GTBX GT-box elements P$S1F.01 0.79 362-378 (−) 0.75 0.797P$AHBP Arabidopsis P$ATHB9.01 0.77 367-377 (−) 1 0.788 homeobox proteinP$AHBP Arabidopsis P$HAHB4.01 0.87 367-377 (+) 1 0.926 homeobox proteinP$GTBX GT-box elements P$SBF1.01 0.87 367-383 (+) 1 0.894 P$L1BX L1 box,motif P$ATML1.01 0.82 369-385 (+) 1 0.827 for L1 layer- specificexpression P$AHBP Arabidopsis P$WUS.01 0.94 371-381 (−) 1 1 homeoboxprotein P$MYBL MYB-like proteins P$ATMYB77.01 0.87 376-392 (−) 0.860.924 P$CCAF Circadian control P$CCA1.01 0.85 387-401 (+) 1 0.851factors P$SUCB Sucrose box P$SUCROSE.01 0.81 392-410 (+) 1 0.864 O$VTBPVertebrate TA- O$LTATA.01 0.82 396-412 (−) 1 0.852 TA binding proteinfactor P$LREM Light responsive P$RAP22.01 0.85 397-407 (+) 1 0.911element motif, not modulated by different light qualities P$AHBPArabidopsis P$HAHB4.01 0.87 401-411 (−) 1 0.916 homeobox protein P$MYBLMYB-like proteins P$WER.01 0.87 403-419 (−) 1 0.9 P$MYBS MYB proteinsP$OSMYBS.01 0.82 416-432 (+) 0.75 0.829 with single DNA binding repeatP$L1BX L1 box, motif P$ATML1.01 0.82 420-436 (−) 0.75 0.821 for L1layer- specific expression O$VTBP Vertebrate TA- O$ATATA.01 0.78 426-442(+) 0.75 0.819 TA binding protein factor P$GTBX GT-box elementsP$SBF1.01 0.87 426-442 (−) 1 0.902 P$MYBL MYB-like proteins P$MYBPH3.020.76 428-444 (−) 1 0.772 P$OCSE Enhancer element P$OCSL.01 0.69 428-448(+) 0.77 0.692 first identified in the promoter of the octopine synthasegene (OCS) of the Agrobacterium tumefaciens T- DNA P$TELO Telo box(plant P$ATPURA.01 0.85 440-454 (−) 0.75 0.854 interstitial telomeremotifs) P$AHBP Arabidopsis P$ATHB5.01 0.89 455-465 (+) 0.83 0.902homeobox protein P$AHBP Arabidopsis P$HAHB4.01 0.87 455-465 (−) 1 0.979homeobox protein P$SUCB Sucrose box P$SUCROSE.01 0.81 461-479 (−) 0.750.815 P$AHBP Arabidopsis P$HAHB4.01 0.87 468-478 (+) 1 0.901 homeoboxprotein O$VTBP Vertebrate TA- O$VTATA.01 0.90 473-489 (−) 1 0.913 TAbinding protein factor O$PTBP Plant TATA O$PTATA.01 0.88 476-490 (−) 10.889 binding protein factor O$VTBP Vertebrate TA- O$ATATA.01 0.78489-505 (−) 0.75 0.825 TA binding protein factor P$L1BX L1 box, motifP$HDG9.01 0.77 491-507 (+) 1 0.791 for L1 layer- specific expressionP$HMGF High mobility P$HMG_IY.01 0.89 492-506 (−) 1 0.902 group factorsP$CCAF Circadian control P$CCA1.01 0.85 498-512 (+) 0.76 0.862 factorsP$HMGF High mobility P$HMG_IY.01 0.89 499-513 (−) 1 0.909 group factorsP$SUCB Sucrose box P$SUCROSE.01 0.81 499-517 (−) 1 0.827 P$GTBX GT-boxelements P$SBF1.01 0.87 509-525 (−) 1 0.885 P$SPF1 Sweet potatoP$SP8BF.01 0.87 520-532 (−) 1 0.905 DNA-binding factor with two WRKY-domains P$WBXF W Box family P$WRKY.01 0.92 526-542 (−) 1 0.936 P$GARPMyb-related P$ARR10.01 0.97 540-548 (+) 1 0.976 DNA binding proteins(Golden2, ARR, Psr) P$AHBP Arabidopsis P$ATHB9.01 0.77 558-568 (−) 10.775 homeobox protein P$L1BX L1 box, motif P$PDF2.01 0.85 558-574 (−) 10.865 for L1 layer- specific expression P$NCS1 Nodulin consensusP$NCS1.01 0.85 558-568 (−) 0.88 0.927 sequence 1 P$EINL Ethyleninsensitive P$TEIL.01 0.92 572-580 (+) 1 0.921 3 like factors P$SBPDSBP-domain P$SBP.01 0.88 573-589 (+) 1 0.885 proteins P$AHBP ArabidopsisP$ATHB5.01 0.89 583-593 (+) 0.94 0.977 homeobox protein P$AHBPArabidopsis P$ATHB5.01 0.89 583-593 (−) 0.83 0.94 homeobox proteinP$MYCL Myc-like basic P$MYCRS.01 0.93 591-609 (−) 0.86 0.958helix-loop-helix binding factors P$OPAQ Opaque-2 like P$O2_GCN4.01 0.81593-609 (+) 1 0.838 transcriptional activators O$VTBP Vertebrate TA-O$VTATA.02 0.89 603-619 (+) 1 0.89 TA binding protein factor P$L1BX L1box, motif P$HDG9.01 0.77 607-623 (+) 1 0.772 for L1 layer- specificexpression P$IBOX Plant I-Box P$IBOX.01 0.81 610-626 (+) 0.75 0.824sites P$MYBS MYB proteins P$MYBST1.01 0.90 613-629 (−) 1 0.953 withsingle DNA binding repeat P$IBOX Plant I-Box P$GATA.01 0.93 616-632 (+)1 0.942 sites P$TEFB TEF-box P$TEF1.01 0.76 616-636 (+) 0.96 0.778P$MYBS MYB proteins P$TAMYB80.01 0.83 625-641 (−) 1 0.861 with singleDNA binding repeat O$PTBP Plant TATA O$PTATA.02 0.90 631-645 (+) 1 0.927binding protein factor O$PTBP Plant TATA O$PTATA.02 0.90 632-646 (−) 10.929 binding protein factor P$L1BX L1 box, motif P$HDG9.01 0.77 648-664(+) 1 0.822 for L1 layer- specific expression P$HMGF High mobilityP$HMG_IY.01 0.89 649-663 (−) 1 0.923 group factors P$DOFF DNA bindingP$PBF.01 0.97 654-670 (+) 1 0.979 with one finger (DOF) P$LREM Lightresponsive P$RAP22.01 0.85 682-692 (−) 1 0.975 element motif, notmodulated by different light qualities P$MYBL MYB-like proteinsP$CARE.01 0.83 689-705 (+) 1 0.884 P$TEFB TEF-box P$TEF1.01 0.76 696-716(−) 0.84 0.779 P$MYBL MYB-like proteins P$CARE.01 0.83 699-715 (−) 10.88 P$LEGB Legumin Box P$RY.01 0.87 704-730 (+) 1 0.94 family P$GBOXPlant G-box/C- P$BZIP910.01 0.77 716-736 (−) 0.75 0.856 box bZIPproteins P$GBOX Plant G-box/C- P$ROM.01 0.85 717-737 (+) 1 1 box bZIPproteins P$ABRE ABA response P$ABF1.03 0.82 719-735 (−) 0.75 0.857elements P$GBOX Plant G-box/C- P$BZIP910.02 0.84 722-742 (−) 0.75 0.862box bZIP proteins P$GBOX Plant G-box/C- P$HBP1B.01 0.83 734-754 (+) 0.770.852 box bZIP proteins P$MYCL Myc-like basic P$MYCRS.01 0.93 739-757(−) 0.86 0.953 helix-loop-helix binding factors P$ABRE ABA responseP$ABF1.01 0.79 741-757 (−) 0.75 0.796 elements P$OPAQ Opque-2 likeP$O2_GCN4.01 0.81 741-757 (+) 1 0.871 transcriptional activators P$OPAQOpaque-2 like P$GCN4.01 0.81 745-761 (−) 1 0.85 transcriptionalactivators P$AREF Auxin response P$ARE.01 0.93 747-759 (+) 1 0.941element P$MYBL MYB-like proteins P$GAMYB.01 0.91 754-770 (+) 1 0.933O$INRE Core promoter O$DINR.01 0.94 757-767 (+) 1 0.943 initiatorelements P$WBXF W Box family P$WRKY.01 0.92 780-796 (+) 1 0.942 P$TEFBTEF-box P$TEF1.01 0.76 783-803 (−) 0.84 0.779 P$MYBL MYB-like proteinsP$CARE.01 0.83 786-802 (−) 1 0.876 P$LEGB Legumin Box P$RY.01 0.87788-814 (−) 1 0.929 family P$LEGB Legumin Box P$RY.01 0.87 791-817 (+) 10.984 family P$ROOT Root hair- P$RHE.01 0.77 796-820 (+) 1 0.812specific cis- elements in angiosperms P$GBOX Plant G-box/C- P$CPRF.010.95 803-823 (−) 1 0.989 box bZIP proteins P$GBOX Plant G-box/C-P$CPRF.01 0.95 804-824 (+) 1 0.98 box bZIP proteins P$MYCL Myc-likebasic P$MYCRS.01 0.93 804-822 (−) 1 0.956 helix-loop-helix bindingfactors P$ABRE ABA response P$ABRE.01 0.82 805-821 (+) 1 0.874 elementsP$MYCL Myc-like basic P$PIF3.01 0.82 805-823 (+) 1 0.922helix-loop-helix binding factors P$OPAQ Opaque-2 like P$RITA1.01 0.95805-821 (−) 1 0.992 transcriptional activators P$ABRE ABA responseP$ABF1.03 0.82 806-822 (−) 1 0.977 elements P$OPAQ Opaque-2 likeP$RITA1.01 0.95 806-822 (+) 1 0.973 transcriptional activators P$OCSEEnhancer element P$OCSL.01 0.69 809-829 (−) 1 0.819 first identified inthe promoter of the octopine synthase gene (OCS) of the Agrobacteriumtumefaciens T- DNA P$GTBX GT-box elements P$S1F.01 0.79 823-839 (−) 10.802 P$LFYB LFY binding P$LFY.01 0.93 839-851 (−) 0.91 0.936 siteP$LEGB Legumin Box P$RY.01 0.87 840-866 (−) 1 0.948 family P$LEGBLegumin Box P$RY.01 0.87 843-869 (+) 1 0.966 family P$LEGB Legumin BoxP$IDE1.01 0.77 847-873 (+) 1 0.779 family P$GBOX Plant G-box/C-P$BZIP910.01 0.77 855-875 (−) 0.75 0.856 box bZIP proteins P$GBOX PlantG-box/C- P$ROM.01 0.85 856-876 (+) 1 1 box bZIP proteins P$ABRE ABAresponse P$ABF1.03 0.82 858-874 (−) 0.75 0.857 elements P$GCCF GCC boxfamily P$ERE_JERE.01 0.85 870-882 (−) 0.81 0.86 P$HEAT Heat shockP$HSE.01 0.81 880-894 (−) 1 0.827 factors P$MYBS MYB proteinsP$ZMMRP1.01 0.79 881-897 (+) 0.81 0.867 with single DNA binding repeatP$LEGB Legumin Box P$RY.01 0.87 895-921 (+) 1 0.924 family P$GBOX PlantG-box/C- P$BZIP910.01 0.77 907-927 (−) 0.75 0.856 box bZIP proteinsP$GBOX Plant G-box/C- P$ROM.01 0.85 908-928 (+) 1 0.938 box bZIPproteins P$ABRE ABA response P$ABF1.03 0.82 910-926 (−) 0.75 0.864elements P$GBOX Plant G-box/C- P$BZIP910.02 0.84 913-933 (−) 0.75 0.871box bZIP proteins P$SBPD SBP-domain P$SBP.01 0.88 939-955 (+) 1 0.887proteins P$EINL Ethylen insensitive P$TEIL.01 0.92 942-950 (+) 0.840.922 3 like factors P$MADS MADS box P$SQUA.01 0.90 960-980 (−) 1 0.908proteins P$L1BX L1 box, motif P$PDF2.01 0.85 963-979 (+) 1 0.856 for L1layer- specific expression P$LREM Light responsive P$RAP22.01 0.85972-982 (+) 1 0.858 element motif, not modulated by different lightqualities O$PTBP Plant TATA O$PTATA.01 0.88 974-988 (−) 0.83 0.905binding protein factor O$VTBP Vertebrate TA- O$ATATA.01 0.78 974-990 (+)0.75 0.83 TA binding protein factor O$VTBP Vertebrate TA- O$MTATA.010.84 976-992 (+) 1 0.855 TA binding protein factor P$MYBL MYB-likeproteins P$MYBPH3.02 0.76 983-999 (−) 1 0.867 P$SUCB Sucrose boxP$SUCROSE.01 0.81  984-1002 (−) 1 0.81 P$AHBP Arabidopsis P$ATHB1.010.90  991-1001 (+) 1 0.989 homeobox protein P$AHBP ArabidopsisP$HAHB4.01 0.87  991-1001 (−) 1 0.943 homeobox protein P$HMGF Highmobility P$HMG_IY.01 0.89  992-1006 (+) 1 0.913 group factors P$OCSEEnhancer element P$OCSL.01 0.69 1004-1024 (+) 1 0.827 first identifiedin the promoter of the octopine synthase gene (OCS) of the Agrobacteriumtumefaciens T- DNA P$GBOX Plant G-box/C- P$UPRE.01 0.86 1009-1029 (−) 10.974 box bZIP proteins P$GBOX Plant G-box/C- P$TGA1.01 0.90 1010-1030(+) 1 0.991 box bZIP proteins P$ABRE ABA response P$ABF1.03 0.821011-1027 (+) 1 0.828 elements P$OPAQ Opaque-2 like P$O2.01 0.871011-1027 (−) 1 0.99 transcriptional activators P$OPAQ Opaque-2 likeP$O2_GCN4.01 0.81 1012-1028 (+) 0.95 0.893 transcriptional activatorsP$ROOT Root hair- P$RHE.01 0.77 1013-1037 (−) 1 0.771 specific cis-elements in angiosperms P$LEGB Legumin Box P$LEGB.01 0.65 1025-1051 (+)1 0.656 family P$AHBP Arabidopsis P$ATHB5.01 0.89 1042-1052 (+) 0.830.902 homeobox protein P$AHBP Arabidopsis P$ATHB5.01 0.89 1042-1052 (−)1 1 homeobox protein P$GTBX GT-box elements P$SBF1.01 0.87 1045-1061 (+)1 0.888 P$GTBX GT-box elements P$SBF1.01 0.87 1046-1062 (−) 1 0.888P$IBOX Plant I-Box P$GATA.01 0.93 1060-1076 (+) 1 0.949 sites O$INRECore promoter O$DINR.01 0.94 1070-1080 (−) 0.97 0.949 initiator elementsP$NACF Plant specific P$TANAC69.01 0.68 1078-1100 (+) 1 0.775 NAC [NAM(no apical meristem), ATAF172, CUC2 (cup- shaped cotyledons 2)]transcription factors P$CCAF Circadian control P$CCA1.01 0.85 1093-1107(−) 1 0.949 factors P$MADS MADS box P$SQUA.01 0.90 1097-1117 (+) 1 0.908proteins P$CARM CA-rich motif P$CARICH.01 0.78 1102-1120 (−) 1 0.791P$MADS MADS box P$SQUA.01 0.90 1108-1128 (−) 1 0.928 proteins O$PTBPPlant TATA O$PTATA.01 0.88 1111-1125 (+) 1 0.961 binding protein factorO$VTBP Vertebrate TA- O$VTATA.01 0.90 1112-1128 (+) 1 0.968 TA bindingprotein factor P$LEGB Legumin Box P$RY.01 0.87 1130-1156 (−) 1 0.932family P$LEGB Legumin Box P$RY.01 0.87 1138-1164 (−) 1 0.914 familyP$ROOT Root hair- P$RHE.01 0.77 1138-1162 (+) 0.75 0.794 specific cis-elements in angiosperms P$L1BX L1 box, motif P$ATML1.01 0.82 1141-1157(+) 0.75 0.833 for L1 layer- specific expression

TABLE 3 Boxes and Motifs identified in the starting sequence of theVfSBP promoter p-VfSBP (nativ) Further Family Position Core MatrixFamily Information Matrix Opt. from-to Strand sim. sim. P$MYBS MYBproteins P$MYBST1.01 0.90 12-28 (+) 1 0.918 with single DNA bindingrepeat P$GAGA GAGA elements P$BPC.01 1.00 25-49 (−) 1 1 P$LEGB LeguminBox P$IDE1.01 0.77  80-106 (−) 1 0.805 family P$GTBX GT-box elementsP$GT3A.01 0.83  85-101 (−) 1 0.843 P$PSRE Pollen-specific P$GAAA.01 0.83101-117 (−) 1 0.883 regulatory elements P$SPF1 Sweet potato P$SP8BF.010.87 118-130 (+) 1 0.897 DNA-binding factor with two WRKY- domainsP$GBOX Plant G-box/C- P$HBP1B.01 0.83 138-158 (+) 1 0.834 box bZIPproteins P$MYBL MYB-like proteins P$MYBPH3.02 0.76 165-181 (−) 0.780.788 P$NACF Plant specific P$TANAC69.01 0.68 173-195 (−) 0.81 0.729 NAC[NAM (no apical meristem), ATAF172, CUC2 (cup- shaped cotyledons 2)]transcription factors P$MADS MADS box P$AGL1.01 0.84 174-194 (−) 0.980.862 proteins P$MADS MADS box P$AGL1.01 0.84 175-195 (+) 0.98 0.863proteins P$TCPF DNA-binding P$ATTCP20.01 0.94 189-201 (+) 1 0.968proteins with the plant specific TCP- domain P$L1BX L1 box, motifP$ATML1.02 0.76 194-210 (−) 0.89 0.8 for L1 layer- specific expressionP$AHBP Arabidopsis P$BLR.01 0.90 198-208 (+) 0.83 0.936 homeobox proteinO$VTBP Vertebrate TA- O$ATATA.01 0.78 207-223 (+) 0.75 0.811 TA bindingprotein factor P$EINL Ethylen insensitive P$TEIL.01 0.92 215-223 (−)0.96 0.924 3 like factors P$GBOX Plant G-box/C- P$HBP1A.01 0.88 217-237(−) 1 0.908 box bZIP proteins P$GBOX Plant G-box/C- P$GBF1.01 0.94218-238 (+) 1 0.963 box bZIP proteins P$GTBX GT-box elements P$S1F.010.79 218-234 (+) 1 0.821 P$ABRE ABA response P$ABF1.03 0.82 219-235 (+)1 0.825 elements P$ROOT Root hair- P$RHE.01 0.77 221-245 (−) 1 0.803specific cis- elements in angiosperms P$CE1F Coupling element P$SBOX.010.87 222-234 (−) 0.78 0.916 1 binding factors O$VTBP Vertebrate TA-O$VTATA.01 0.90 233-249 (−) 1 0.916 TA binding protein factor O$PTBPPlant TATA O$PTATA.02 0.90 236-250 (−) 1 0.9 binding protein factorP$AHBP Arabidopsis P$ATHB5.01 0.89 256-266 (+) 0.94 0.896 homeoboxprotein P$NCS1 Nodulin consensus P$NCS1.01 0.85 256-266 (−) 0.88 0.871sequence 1 P$LREM Light responsive P$RAP22.01 0.85 290-300 (−) 1 0.931element motif, not modulated by different light qualities P$AGP1 PlantGATA- P$AGP1.01 0.91 292-302 (−) 1 0.984 type zinc finger protein P$LREMLight responsive P$RAP22.01 0.85 306-316 (+) 1 0.938 element motif, notmodulated by different light qualities P$MYBL MYB-like proteinsP$CARE.01 0.83 308-324 (−) 1 0.854 P$CCAF Circadian control P$CCA1.010.85 354-368 (+) 1 0.895 factors P$HEAT Heat shock P$HSE.01 0.81 375-389(−) 1 0.861 factors P$MYBL MYB-like proteins P$WER.01 0.87 392-408 (−) 10.87 P$MYBL MYB-like proteins P$WER.01 0.87 394-410 (+) 1 0.95 P$MSAEM-phase- P$MSA.01 0.80 395-409 (−) 0.75 0.808 specific activatorelements P$HEAT Heat shock P$HSE.01 0.81 415-429 (+) 1 0.811 factorsP$SUCB Sucrose box P$SUCROSE.01 0.81 421-439 (−) 0.75 0.852 P$WBXF W Boxfamily P$WRKY.01 0.92 426-442 (+) 1 0.939 P$DOFF DNA binding P$PBOX.010.75 431-447 (−) 0.76 0.782 with one finger (DOF) P$WBXF W Box familyP$WRKY.01 0.92 453-469 (+) 1 0.958 P$MYBL MYB-like proteins P$MYBPH3.020.76 468-484 (−) 0.82 0.849 P$OPAQ Opaque-2 like P$O2_GCN4.01 0.81486-502 (+) 1 0.818 transcriptional activators P$OPAQ Opaque-2 likeP$O2.01 0.87 498-514 (−) 1 0.919 transcriptional activators P$HEAT Heatshock P$HSE.01 0.81 512-526 (−) 1 0.85 factors P$WBXF W Box familyP$WRKY.01 0.92 533-549 (−) 1 0.966 P$WBXF W Box family P$WRKY.01 0.92543-559 (+) 1 0.966 P$WBXF W Box family P$ERE.01 0.89 562-578 (+) 10.972 P$DOFF DNA binding P$PBOX.01 0.75 614-630 (+) 0.76 0.766 with onefinger (DOF) P$GTBX GT-box elements P$S1F.01 0.79 630-646 (+) 1 0.819P$AGP1 Plant GATA- P$AGP1.01 0.91 636-646 (−) 1 0.913 type zinc fingerprotein P$AGP1 Plant GATA- P$AGP1.01 0.91 637-647 (+) 1 0.915 type zincfinger protein P$HEAT Heat shock P$HSE.01 0.81 649-663 (+) 0.78 0.87factors P$HEAT Heat shock P$HSE.01 0.81 654-668 (−) 1 0.815 factorsO$INRE Core promoter O$DINR.01 0.94 660-670 (−) 1 0.944 initiatorelements P$GAPB GAP-Box (light P$GAP.01 0.88 702-716 (−) 1 0.897response elements) P$GTBX GT-box elements P$GT1.01 0.85 723-739 (−) 10.925 P$AHBP Arabidopsis P$WUS.01 0.94 726-736 (−) 1 1 homeobox proteinP$MYBL MYB-like proteins P$GAMYB.01 0.91 773-789 (+) 1 0.951 P$GTBXGT-box elements P$GT3A.01 0.83 775-791 (+) 1 0.899 P$MYBL MYB-likeproteins P$CARE.01 0.83 801-817 (−) 1 0.837 O$VTBP Vertebrate TA-O$ATATA.01 0.78 803-819 (−) 1 0.811 TA binding protein factor O$VTBPVertebrate TA- O$ATATA.01 0.78 819-835 (−) 0.75 0.874 TA binding proteinfactor P$MADS MADS box P$AGL15.01 0.79 827-847 (−) 0.83 0.791 proteinsP$MADS MADS box P$AGL15.01 0.79 828-848 (+) 1 0.895 proteins P$CCAFCircadian control P$CCA1.01 0.85 843-857 (−) 1 0.883 factors P$GTBXGT-box elements P$SBF1.01 0.87 844-860 (−) 1 0.948 P$CARM CA-rich motifP$CARICH.01 0.78 845-863 (+) 1 0.806 P$PSRE Pollen-specific P$GAAA.010.83 858-874 (+) 0.75 0.831 regulatory elements P$MYBL MYB-like proteinsP$NTMYBAS1.01 0.96 867-883 (+) 1 0.963 P$GTBX GT-box elements P$SBF1.010.87 869-885 (+) 1 0.883 P$RAV5 5′-part of bipartite P$RAV1-5.01 0.96882-892 (+) 1 0.96 RAV1 binding site P$AHBP Arabidopsis P$WUS.01 0.94888-898 (−) 1 1 homeobox protein P$GTBX GT-box elements P$SBF1.01 0.87897-913 (+) 1 0.886 P$AHBP Arabidopsis P$BLR.01 0.90 906-916 (+) 1 1homeobox protein P$AHBP Arabidopsis P$BLR.01 0.90 907-917 (−) 1 0.903homeobox protein P$CARM CA-rich motif P$CARICH.01 0.78 908-926 (−) 10.826 P$MYBL MYB-like proteins P$NTMYBAS1.01 0.96 916-932 (−) 1 0.962P$MIIG MYB IIG-type P$PALBOXP.01 0.81 918-932 (−) 0.94 0.817 bindingsites P$DOFF DNA binding P$DOF1.01 0.98 929-945 (−) 1 0.983 with onefinger (DOF) P$GTBX GT-box elements P$GT1.01 0.85 933-949 (+) 0.97 0.854O$VTBP Vertebrate TA- O$LTATA.01 0.82 944-960 (+) 1 0.829 TA bindingprotein factor P$AHBP Arabidopsis P$ATHB9.01 0.77 959-969 (+) 0.75 0.816homeobox protein P$AHBP Arabidopsis P$ATHB9.01 0.77 959-969 (−) 1 0.909homeobox protein P$AHBP Arabidopsis P$HAHB4.01 0.87 970-980 (+) 1 0.916homeobox protein P$AHBP Arabidopsis P$ATHB1.01 0.90 973-983 (+) 1 0.989homeobox protein P$AHBP Arabidopsis P$HAHB4.01 0.87 973-983 (−) 1 0.976homeobox protein P$IDDF ID domain factors P$ID1.01 0.92 976-988 (+) 10.928 P$IBOX Plant I-Box P$GATA.01 0.93  995-1011 (+) 1 0.96 sitesP$AHBP Arabidopsis P$HAHB4.01 0.87 1008-1018 (+) 1 0.937 homeoboxprotein P$AHBP Arabidopsis P$WUS.01 0.94 1012-1022 (−) 1 1 homeoboxprotein P$SPF1 Sweet potato P$SP8BF.01 0.87 1029-1041 (−) 0.78 0.879DNA-binding factor with two WRKY- domains P$SUCB Sucrose boxP$SUCROSE.01 0.81 1036-1054 (−) 1 0.822 P$AHBP Arabidopsis P$ATHB1.010.90 1054-1064 (+) 1 0.99 homeobox protein P$AHBP Arabidopsis P$ATHB5.010.89 1054-1064 (−) 0.83 0.94 homeobox protein P$GTBX GT-box elementsP$GT3A.01 0.83 1066-1082 (+) 1 0.889 O$PTBP Plant TATA O$PTATA.02 0.901086-1100 (+) 1 0.94 binding protein factor O$VTBP Vertebrate TA-O$VTATA.01 0.90 1087-1103 (+) 0.89 0.927 TA binding protein factorO$PTBP Plant TATA O$PTATA.01 0.88 1088-1102 (+) 1 0.958 binding proteinfactor O$VTBP Vertebrate TA- O$VTATA.01 0.90 1089-1105 (+) 1 0.971 TAbinding protein factor P$E2FF E2F-homolog P$E2F.01 0.82 1117-1131 (−) 10.833 cell cycle regulators P$PSRE Pollen-specific P$GAAA.01 0.831146-1162 (+) 1 0.908 regulatory elements P$GTBX GT-box elementsP$S1F.01 0.79 1153-1169 (+) 1 0.8 P$GTBX GT-box elements P$S1F.01 0.791170-1186 (−) 1 0.797 P$SUCB Sucrose box P$SUCROSE.01 0.81 1173-1191 (+)1 0.813 P$MADS MADS box P$AGL2.01 0.82 1174-1194 (+) 1 0.9 proteinsP$AHBP Arabidopsis P$BLR.01 0.90 1189-1199 (+) 0.83 0.919 homeoboxprotein P$DOFF DNA binding P$PBOX.01 0.75 1229-1245 (−) 0.76 0.763 withone finger (DOF) P$MYBL MYB-like proteins P$WER.01 0.87 1234-1250 (−)0.94 0.88 O$PTBP Plant TATA O$PTATA.01 0.88 1241-1255 (+) 1 0.964binding protein factor O$VTBP Vertebrate TA- O$VTATA.01 0.90 1242-1258(+) 1 0.967 TA binding protein factor P$DOFF DNA binding P$PBOX.01 0.751265-1281 (−) 0.76 0.762 with one finger (DOF) P$GTBX GT-box elementsP$GT3A.01 0.83 1265-1281 (+) 0.75 0.839 P$AHBP Arabidopsis P$BLR.01 0.901274-1284 (−) 1 0.928 homeobox protein P$OCSE Enhancer element P$OCSL.010.69 1278-1298 (+) 0.77 0.732 first identified in the promoter of theoctopine synthase gene (OCS) of the Agrobacterium tumefaciens T- DNAP$MYCL Myc-like basic P$MYCRS.01 0.93 1284-1302 (−) 0.86 0.963helix-loop-helix binding factors P$TALE TALE (3-aa P$KN1_KIP.01 0.881289-1301 (−) 1 1 acid loop extension) class homeodomain proteins P$AREFAuxin response P$SEBF.01 0.96 1292-1304 (+) 1 0.98 element P$MSAEM-phase- P$MSA.01 0.80 1295-1309 (−) 0.75 0.818 specific activatorelements P$DOFF DNA binding P$PBOX.01 0.75 1296-1312 (−) 1 0.776 withone finger (DOF) P$MYBL MYB-like proteins P$WER.01 0.87 1310-1326 (−)0.94 0.876 P$AHBP Arabidopsis P$BLR.01 0.90 1319-1329 (+) 1 0.93homeobox protein O$VTBP Vertebrate TA- O$ATATA.01 0.78 1323-1339 (−) 10.881 TA binding protein factor P$LREM Light responsive P$RAP22.01 0.851327-1337 (−) 1 0.936 element motif, not modulated by different lightqualities P$GTBX GT-box elements P$SBF1.01 0.87 1338-1354 (+) 1 0.896P$SUCB Sucrose box P$SUCROSE.01 0.81 1338-1356 (−) 1 0.819 P$AHBPArabidopsis P$ATHB5.01 0.89 1345-1355 (+) 0.83 0.902 homeobox proteinP$AHBP Arabidopsis P$BLR.01 0.90 1345-1355 (−) 1 0.998 homeobox proteinP$AGP1 Plant GATA- P$AGP1.01 0.91 1354-1364 (−) 1 0.916 type zinc fingerprotein O$VTBP Vertebrate TA- O$VTATA.01 0.90 1376-1392 (−) 1 0.949 TAbinding protein factor P$HMGF High mobility P$HMG_IY.01 0.89 1377-1391(+) 1 0.952 group factors O$PTBP Plant TATA O$PTATA.01 0.88 1379-1393(−) 1 0.883 binding protein factor P$IBOX Plant I-Box P$IBOX.01 0.811399-1415 (−) 0.75 0.822 sites O$VTBP Vertebrate TA- O$LTATA.01 0.821417-1433 (−) 1 0.86 TA binding protein factor P$IBOX Plant I-BoxP$IBOX.01 0.81 1419-1435 (−) 0.75 0.824 sites P$WBXF W Box familyP$WRKY.01 0.92 1429-1445 (−) 1 0.958 P$MYBL MYB-like proteinsP$MYBPH3.02 0.76 1457-1473 (+) 0.82 0.798 P$ROOT Root hair- P$RHE.020.77 1458-1482 (+) 0.75 0.786 specific cis- elements in angiospermsP$LFYB LFY binding P$LFY.01 0.93 1486-1498 (−) 0.91 0.987 site P$CAATCCAAT binding P$CAAT.01 0.97 1490-1498 (−) 1 0.982 factors P$HEAT Heatshock P$HSE.01 0.81 1526-1540 (+) 1 0.833 factors P$AHBP ArabidopsisP$BLR.01 0.90 1550-1560 (−) 1 0.93 homeobox protein P$IDDF ID domainfactors P$ID1.01 0.92 1563-1575 (+) 1 0.952 P$NCS2 Nodulin consensusP$NCS2.01 0.79 1565-1579 (+) 0.75 0.845 sequence 2 O$VTBP Vertebrate TA-O$MTATA.01 0.84 1570-1586 (+) 1 0.846 TA binding protein factor P$DOFFDNA binding P$PBF.01 0.97 1571-1587 (+) 1 0.988 with one finger (DOF)P$LEGB Legumin Box P$RY.01 0.87 1572-1598 (−) 1 0.898 family P$MADS MADSbox P$AGL3.01 0.83 1637-1657 (+) 1 0.851 proteins P$MYBL MYB-likeproteins P$ATMYB77.01 0.87 1654-1670 (−) 1 0.909 P$URNA Upstreamsequence P$USE.01 0.75 1659-1675 (+) 1 0.758 element of U- snRNA genesP$AHBP Arabidopsis P$ATHB1.01 0.90 1671-1681 (−) 1 0.989 homeoboxprotein P$AHBP Arabidopsis P$HAHB4.01 0.87 1671-1681 (+) 1 0.955homeobox protein P$OCSE Enhancer element P$OCSL.01 0.69 1677-1697 (+) 10.763 first identified in the promoter of the octopine synthase gene(OCS) of the Agrobacterium tumefaciens T- DNA P$GBOX Plant G-box/C-P$GBF1.01 0.94 1682-1702 (−) 1 0.968 box bZIP proteins P$ABRE ABAresponse P$ABRE.01 0.82 1685-1701 (−) 1 0.855 elements P$BRREBrassinosteroid P$BZR1.01 0.95 1696-1712 (−) 1 0.954 (BR) responseelement P$GBOX Plant G-box/C- P$GBF1.01 0.94 1696-1716 (−) 1 0.963 boxbZIP proteins P$TEFB TEF-box P$TEF1.01 0.76 1696-1716 (−) 0.96 0.826P$OPAQ Opaque-2 like P$O2_GCN4.01 0.81 1698-1714 (−) 0.95 0.824transcriptional activators P$DPBF Dc3 promoter P$DPBF.01 0.89 1700-1710(+) 1 0.943 binding factors P$LEGB Legumin Box P$RY.01 0.87 1701-1727(−) 1 0.887 family P$LEGB Legumin Box P$IDE1.01 0.77 1708-1734 (+) 10.871 family P$MYBS MYB proteins P$TAMYB80.01 0.83 1727-1743 (+) 1 0.85with single DNA binding repeat P$ROOT Root hair- P$RHE.02 0.77 1740-1764(+) 1 0.786 specific cis- elements in angiosperms P$GBOX Plant G-box/C-P$EMBP1.01 0.84 1747-1767 (−) 1 0.84 box bZIP proteins P$ABRE ABAresponse P$ABRE.01 0.82 1750-1766 (−) 1 0.831 elements O$VTBP VertebrateTA- O$VTATA.01 0.90 1756-1772 (+) 1 0.963 TA binding protein factorP$MYBL MYB-like proteins P$MYBPH3.02 0.76 1765-1781 (−) 1 0.781

TABLE 4 Boxes and Motifs identified in the permutated sequence of theVfSBP promoter. Preferably associated boxes are annotated in line 8, 14,26, 56, 58, 59, 66, 121, 144, 148, 158, 185, 200, 201, 211, 215, 218,219, 220, 225, 226, 228 of tables 3 and 4. Essential boxes are annotatedin line 130, 132 and 146 of tables 3 and 4. p-VfSBP_perm Further FamilyPosition Core Matrix Family Information Matrix Opt. from-to Strand sim.sim. P$MYBS MYB proteins P$MYBST1.01 0.90 12-28 (+) 1 0.918 with singleDNA binding repeat P$AGP1 Plant GATA- P$AGP1.01 0.91 25-35 (−) 1 0.914type zinc finger protein P$GAGA GAGA elements P$BPC.01 1.00 25-49 (−) 11 P$AGP1 Plant GATA- P$AGP1.01 0.91 26-36 (+) 1 0.914 type zinc fingerprotein P$LEGB Legumin Box P$IDE1.01 0.77  80-106 (−) 1 0.805 familyP$GTBX GT-box elements P$GT3A.01 0.83  85-101 (−) 1 0.843 P$PSREPollen-specific P$GAAA.01 0.83 101-117 (−) 1 0.883 regulatory elementsP$GBOX Plant G-box/C- P$HBP1B.01 0.83 138-158 (+) 1 0.834 box bZIPproteins P$WBXF W Box family P$ERE.01 0.89 154-170 (−) 1 0.935 P$MYBLMYB-like proteins P$MYBPH3.02 0.76 165-181 (−) 0.78 0.788 P$NACF Plantspecific P$TANAC69.01 0.68 173-195 (−) 0.81 0.728 NAC [NAM (no apicalmeristem), ATAF172, CUC2 (cup- shaped cotyledons 2)] transcriptionfactors P$MADS MADS box P$AGL1.01 0.84 174-194 (−) 0.98 0.856 proteinsP$MADS MADS box P$AGL1.01 0.84 175-195 (+) 0.98 0.844 proteins P$TCPFDNA-binding P$ATTCP20.01 0.94 189-201 (+) 1 0.968 proteins with theplant specific TCP- domain P$L1BX L1 box, motif P$ATML1.02 0.76 194-210(−) 0.89 0.795 for L1 layer- specific expression P$AHBP ArabidopsisP$BLR.01 0.90 198-208 (+) 0.83 0.936 homeobox protein P$EINL Ethyleninsensitive P$TEIL.01 0.92 215-223 (−) 0.96 0.924 3 like factors P$GBOXPlant G-box/C- P$HBP1A.01 0.88 217-237 (−) 1 0.908 box bZIP proteinsP$GBOX Plant G-box/C- P$GBF1.01 0.94 218-238 (+) 1 0.963 box bZIPproteins P$GTBX GT-box elements P$S1F.01 0.79 218-234 (+) 1 0.821 P$ABREABA response P$ABF1.03 0.82 219-235 (+) 1 0.825 elements P$ROOT Roothair- P$RHE.01 0.77 221-245 (−) 1 0.803 specific cis- elements inangiosperms P$CE1F Coupling element P$SBOX.01 0.87 222-234 (−) 0.780.916 1 binding factors O$VTBP Vertebrate TA- O$VTATA.01 0.90 233-249(−) 1 0.939 TA binding protein factor P$IBOX Plant I-Box P$GATA.01 0.93245-261 (−) 1 0.963 sites P$MYBS MYB proteins P$HVMCB1.01 0.93 248-264(+) 1 0.957 with single DNA binding repeat P$AHBP Arabidopsis P$ATHB5.010.89 256-266 (+) 0.94 0.896 homeobox protein P$NCS1 Nodulin consensusP$NCS1.01 0.85 256-266 (−) 0.88 0.871 sequence 1 O$VTBP Vertebrate TA-O$ATATA.01 0.78 260-276 (+) 1 0.819 TA binding protein factor P$LREMLight responsive P$RAP22.01 0.85 290-300 (−) 1 0.931 element motif, notmodulated by different light qualities P$AGP1 Plant GATA- P$AGP1.01 0.91292-302 (−) 1 0.984 type zinc finger protein P$AGP1 Plant GATA-P$AGP1.01 0.91 293-303 (+) 1 0.915 type zinc finger protein P$LREM Lightresponsive P$RAP22.01 0.85 306-316 (+) 1 0.938 element motif, notmodulated by different light qualities P$MYBL MYB-like proteinsP$CARE.01 0.83 308-324 (−) 1 0.854 P$MYBL MYB-like proteins P$ATMYB77.010.87 319-335 (+) 1 0.87 O$INRE Core promoter O$DINR.01 0.94 322-332 (+)1 0.969 initiator elements P$MADS MADS box P$AGL15.01 0.79 345-365 (+)0.85 0.825 proteins P$CCAF Circadian control P$CCA1.01 0.85 354-368 (+)1 0.895 factors P$HEAT Heat shock P$HSE.01 0.81 375-389 (−) 1 0.861factors P$MYBL MYB-like proteins P$WER.01 0.87 392-408 (−) 1 0.87 P$MYBLMYB-like proteins P$WER.01 0.87 394-410 (+) 1 0.95 P$MSAE M-phase-P$MSA.01 0.80 395-409 (−) 0.75 0.808 specific activator elements P$HMGFHigh mobility P$HMG_IY.01 0.89 402-416 (−) 1 0.929 group factors P$CCAFCircadian control P$CCA1.01 0.85 404-418 (+) 1 0.871 factors P$AHBPArabidopsis P$BLR.01 0.90 407-417 (−) 1 0.901 homeobox protein P$LREMLight responsive P$RAP22.01 0.85 411-421 (+) 1 0.916 element motif, notmodulated by different light qualities P$HEAT Heat shock P$HSE.01 0.81415-429 (+) 1 0.811 factors P$SUCB Sucrose box P$SUCROSE.01 0.81 421-439(−) 0.75 0.849 P$DOFF DNA binding P$PBOX.01 0.75 431-447 (−) 0.76 0.782with one finger (DOF) P$WBXF W Box family P$WRKY.01 0.92 453-469 (+) 10.958 P$MYBL MYB-like proteins P$MYBPH3.02 0.76 468-484 (−) 0.82 0.849P$OPAQ Opaque-2 like P$O2_GCN4.01 0.81 486-502 (+) 1 0.818transcriptional activators P$OPAQ Opaque-2 like P$O2.01 0.87 498-514 (−)1 0.919 transcriptional activators P$HEAT Heat shock P$HSE.01 0.81512-526 (−) 1 0.824 factors P$NCS2 Nodulin consensus P$NCS2.01 0.79525-539 (−) 0.75 0.815 sequence 2 P$WBXF W Box family P$WRKY.01 0.92533-549 (−) 1 0.966 P$WBXF W Box family P$WRKY.01 0.92 543-559 (+) 10.966 P$WBXF W Box family P$ERE.01 0.89 562-578 (+) 1 0.972 P$DOFF DNAbinding P$PBOX.01 0.75 614-630 (+) 0.76 0.766 with one finger (DOF)P$GTBX GT-box elements P$S1F.01 0.79 630-646 (+) 1 0.819 P$AGP1 PlantGATA- P$AGP1.01 0.91 636-646 (−) 1 0.913 type zinc finger protein P$AGP1Plant GATA- P$AGP1.01 0.91 637-647 (+) 1 0.921 type zinc finger proteinP$MYBL MYB-like proteins P$GAMYB.01 0.91 640-656 (−) 1 0.918 P$HEAT Heatshock P$HSE.01 0.81 649-663 (+) 0.78 0.87 factors P$HEAT Heat shockP$HSE.01 0.81 654-668 (−) 1 0.815 factors O$INRE Core promoter O$DINR.010.94 660-670 (−) 1 0.944 initiator elements P$PREM Motifs of plastidP$MGPROTORE.01 0.77 691-721 (−) 1 0.789 response elements P$GAPB GAP-Box(light P$GAP.01 0.88 702-716 (−) 1 0.897 response elements) P$GTBXGT-box elements P$GT1.01 0.85 723-739 (−) 1 0.925 P$AHBP ArabidopsisP$WUS.01 0.94 726-736 (−) 1 1 homeobox protein P$CARM CA-rich motifP$CARICH.01 0.78 731-749 (+) 1 0.855 P$MYCL Myc-like basic P$ICE.01 0.95734-752 (+) 0.95 0.961 helix-loop-helix binding factors P$MYBL MYB-likeproteins P$GAMYB.01 0.91 773-789 (+) 1 0.951 P$GTBX GT-box elementsP$GT3A.01 0.83 775-791 (+) 1 0.899 P$MYBL MYB-like proteins P$CARE.010.83 801-817 (−) 1 0.837 O$VTBP Vertebrate TA- O$ATATA.01 0.78 803-819(−) 1 0.811 TA binding protein factor P$L1BX L1 box, motif P$PDF2.010.85 814-830 (−) 1 0.869 for L1 layer- specific expression P$GTBX GT-boxelements P$GT1.01 0.85 815-831 (−) 0.97 0.854 O$VTBP Vertebrate TA-O$ATATA.01 0.78 819-835 (−) 0.75 0.874 TA binding protein factor P$MADSMADS box P$AGL15.01 0.79 828-848 (+) 1 0.857 proteins P$CCAF Circadiancontrol P$CCA1.01 0.85 843-857 (−) 1 0.883 factors P$GTBX GT-boxelements P$SBF1.01 0.87 844-860 (−) 1 0.948 P$CARM CA-rich motifP$CARICH.01 0.78 845-863 (+) 1 0.806 P$MYBL MYB-like proteins P$CARE.010.83 849-865 (−) 1 0.876 P$GTBX GT-box elements P$SBF1.01 0.87 869-885(+) 1 0.883 P$RAV5 5′-part of bipartite P$RAV1-5.01 0.96 882-892 (+) 10.96 RAV1 binding site P$L1BX L1 box, motif P$PDF2.01 0.85 884-900 (−)0.85 0.853 for L1 layer- specific expression P$AHBP Arabidopsis P$WUS.010.94 888-898 (−) 1 1 homeobox protein P$MYBL MYB-like proteinsP$ATMYB77.01 0.87 895-911 (+) 1 0.962 P$GTBX GT-box elements P$SBF1.010.87 897-913 (+) 1 0.883 P$AHBP Arabidopsis P$BLR.01 0.90 906-916 (+) 11 homeobox protein P$AHBP Arabidopsis P$BLR.01 0.90 907-917 (−) 1 0.903homeobox protein P$CARM CA-rich motif P$CARICH.01 0.78 908-926 (−) 10.826 P$MYBL MYB-like proteins P$NTMYBAS1.01 0.96 916-932 (−) 1 0.962P$MIIG MYB IIG-type P$PALBOXP.01 0.81 918-932 (−) 0.94 0.817 bindingsites P$SPF1 Sweet potato P$SP8BF.01 0.87 931-943 (−) 1 0.889DNA-binding factor with two WRKY- domains P$L1BX L1 box, motifP$ATML1.01 0.82 948-964 (+) 1 0.908 for L1 layer- specific expressionP$AHBP Arabidopsis P$ATHB9.01 0.77 959-969 (+) 0.75 0.816 homeoboxprotein P$AHBP Arabidopsis P$ATHB9.01 0.77 959-969 (−) 1 0.909 homeoboxprotein P$AHBP Arabidopsis P$HAHB4.01 0.87 970-980 (+) 1 0.916 homeoboxprotein P$AHBP Arabidopsis P$ATHB1.01 0.90 973-983 (+) 1 0.989 homeoboxprotein P$AHBP Arabidopsis P$HAHB4.01 0.87 973-983 (−) 1 0.976 homeoboxprotein P$IDDF ID domain factors P$ID1.01 0.92 976-988 (+) 1 0.928P$AHBP Arabidopsis P$HAHB4.01 0.87 985-995 (+) 1 0.916 homeobox proteinP$GTBX GT-box elements P$SBF1.01 0.87  985-1001 (+) 1 0.891 P$GTBXGT-box elements P$SBF1.01 0.87  986-1002 (−) 1 0.877 P$AHBP ArabidopsisP$HAHB4.01 0.87  992-1002 (−) 1 0.916 homeobox protein P$IBOX PlantI-Box P$GATA.01 0.93  995-1011 (+) 1 0.935 sites P$LEGB Legumin BoxP$LEGB.01 0.65  998-1024 (+) 0.75 0.676 family P$AHBP ArabidopsisP$HAHB4.01 0.87 1008-1018 (+) 1 0.937 homeobox protein P$AHBPArabidopsis P$WUS.01 0.94 1012-1022 (−) 1 1 homeobox protein P$MYBLMYB-like proteins P$GAMYB.01 0.91 1022-1038 (−) 1 0.925 P$SPF1 Sweetpotato P$SP8BF.01 0.87 1029-1041 (−) 0.78 0.879 DNA-binding factor withtwo WRKY- domains P$SUCB Sucrose box P$SUCROSE.01 0.81 1036-1054 (−) 10.83 P$AHBP Arabidopsis P$ATHB1.01 0.90 1054-1064 (+) 1 0.99 homeoboxprotein P$AHBP Arabidopsis P$ATHB5.01 0.89 1054-1064 (−) 0.83 0.94homeobox protein P$GTBX GT-box elements P$GT3A.01 0.83 1066-1082 (+) 10.889 O$PTBP Plant TATA O$PTATA.02 0.90 1086-1100 (+) 1 0.94 bindingprotein factor O$VTBP Vertebrate TA- O$VTATA.01 0.90 1087-1103 (+) 0.890.927 TA binding protein factor O$PTBP Plant TATA O$PTATA.01 0.881088-1102 (+) 1 0.958 binding protein factor O$VTBP Vertebrate TA-O$VTATA.01 0.90 1089-1105 (+) 1 0.971 TA binding protein factor P$DOFFDNA binding P$DOF3.01 0.99 1098-1114 (+) 1 0.995 with one finger (DOF)P$E2FF E2F-homolog P$E2F.01 0.82 1117-1131 (−) 1 0.833 cell cycleregulators P$SPF1 Sweet potato P$SP8BF.01 0.87 1130-1142 (+) 1 0.881DNA-binding factor with two WRKY- domains P$PSRE Pollen-specificP$GAAA.01 0.83 1146-1162 (+) 1 0.873 regulatory elements P$GTBX GT-boxelements P$S1F.01 0.79 1170-1186 (−) 1 0.797 P$SUCB Sucrose boxP$SUCROSE.01 0.81 1173-1191 (+) 1 0.813 P$MADS MADS box P$AGL2.01 0.821174-1194 (+) 1 0.9 proteins P$AHBP Arabidopsis P$BLR.01 0.90 1189-1199(+) 0.83 0.919 homeobox protein P$IDDF ID domain factors P$ID1.01 0.921205-1217 (−) 1 0.97 P$DOFF DNA binding P$PBOX.01 0.75 1229-1245 (−)0.76 0.763 with one finger (DOF) P$MYBL MYB-like proteins P$WER.01 0.871234-1250 (−) 0.94 0.88 O$PTBP Plant TATA O$PTATA.01 0.88 1241-1255 (+)1 0.964 binding protein factor O$VTBP Vertebrate TA- O$VTATA.01 0.901242-1258 (+) 1 0.967 TA binding protein factor P$DOFF DNA bindingP$PBOX.01 0.75 1265-1281 (−) 0.76 0.762 with one finger (DOF) P$GTBXGT-box elements P$GT3A.01 0.83 1265-1281 (+) 0.75 0.839 P$AHBPArabidopsis P$BLR.01 0.90 1274-1284 (−) 1 0.928 homeobox protein O$PTBPPlant TATA O$PTATA.01 0.88 1277-1291 (+) 1 0.908 binding protein factorO$VTBP Vertebrate TA- O$VTATA.01 0.90 1278-1294 (+) 1 0.918 TA bindingprotein factor P$OCSE Enhancer element P$OCSL.01 0.69 1278-1298 (+) 0.770.712 first identified in the promoter of the octopine synthase gene(OCS) of the Agrobacterium tumefaciens T- DNA P$MYCL Myc-like basicP$MYCRS.01 0.93 1284-1302 (−) 0.86 0.933 helix-loop-helix bindingfactors P$TALE TALE (3-aa P$KN1_KIP.01 0.88 1289-1301 (−) 1 1 acid loopextension) class homeodomain proteins P$AREF Auxin response P$SEBF.010.96 1292-1304 (+) 1 0.98 element P$MSAE M-phase- P$MSA.01 0.801295-1309 (−) 0.75 0.803 specific activator elements P$DOFF DNA bindingP$PBOX.01 0.75 1296-1312 (−) 1 0.797 with one finger (DOF) P$MYBLMYB-like proteins P$WER.01 0.87 1310-1326 (−) 0.94 0.876 P$AHBPArabidopsis P$BLR.01 0.90 1319-1329 (+) 1 0.93 homeobox protein O$VTBPVertebrate TA- O$ATATA.01 0.78 1323-1339 (−) 1 0.833 TA binding proteinfactor P$LREM Light responsive P$RAP22.01 0.85 1327-1337 (−) 1 0.936element motif, not modulated by different light qualities P$IBOX PlantI-Box P$GATA.01 0.93 1328-1344 (+) 1 0.939 sites P$SUCB Sucrose boxP$SUCROSE.01 0.81 1334-1352 (+) 1 0.816 P$AHBP Arabidopsis P$ATHB5.010.89 1335-1345 (−) 0.83 0.904 homeobox protein P$AHBP ArabidopsisP$BLR.01 0.90 1335-1345 (+) 1 0.998 homeobox protein P$GTBX GT-boxelements P$SBF1.01 0.87 1338-1354 (+) 1 0.896 P$SUCB Sucrose boxP$SUCROSE.01 0.81 1338-1356 (−) 1 0.819 P$AHBP Arabidopsis P$ATHB5.010.89 1345-1355 (+) 0.83 0.902 homeobox protein P$AHBP ArabidopsisP$BLR.01 0.90 1345-1355 (−) 1 0.998 homeobox protein P$AGP1 Plant GATA-P$AGP1.01 0.91 1354-1364 (−) 1 0.916 type zinc finger protein P$AHBPArabidopsis P$HAHB4.01 0.87 1365-1375 (−) 1 0.896 homeobox proteinO$VTBP Vertebrate TA- O$VTATA.01 0.90 1376-1392 (−) 1 0.949 TA bindingprotein factor P$HMGF High mobility P$HMG_IY.01 0.89 1377-1391 (+) 10.952 group factors O$PTBP Plant TATA O$PTATA.01 0.88 1379-1393 (−) 10.883 binding protein factor P$IDDF ID domain factors P$ID1.01 0.921387-1399 (+) 1 0.926 P$MYBL MYB-like proteins P$GAMYB.01 0.91 1389-1405(+) 1 0.939 O$INRE Core promoter O$DINR.01 0.94 1392-1402 (+) 1 0.943initiator elements P$IBOX Plant I-Box P$IBOX.01 0.81 1399-1415 (−) 0.750.822 sites P$MYBL MYB-like proteins P$WER.01 0.87 1410-1426 (+) 1 0.875P$SPF1 Sweet potato P$SP8BF.01 0.87 1412-1424 (+) 1 0.91 DNA-bindingfactor with two WRKY- domains O$VTBP Vertebrate TA- O$LTATA.01 0.821417-1433 (−) 1 0.847 TA binding protein factor P$IBOX Plant I-BoxP$IBOX.01 0.81 1419-1435 (−) 0.75 0.824 sites P$WBXF W Box familyP$WRKY.01 0.92 1429-1445 (−) 1 0.958 P$MYBL MYB-like proteinsP$MYBPH3.02 0.76 1457-1473 (+) 0.82 0.798 P$ROOT Root hair- P$RHE.020.77 1458-1482 (+) 0.75 0.786 specific cis- elements in angiospermsP$LFYB LFY binding P$LFY.01 0.93 1486-1498 (−) 0.91 0.987 site P$CAATCCAAT binding P$CAAT.01 0.97 1490-1498 (−) 1 0.982 factors P$HEAT Heatshock P$HSE.01 0.81 1526-1540 (+) 1 0.833 factors P$GTBX GT-box elementsP$GT1.01 0.85 1536-1552 (−) 0.84 0.869 P$WBXF W Box family P$ERE.01 0.891537-1553 (+) 1 0.9 P$SPF1 Sweet potato P$SP8BF.01 0.87 1546-1558 (+) 10.919 DNA-binding factor with two WRKY- domains P$AHBP ArabidopsisP$BLR.01 0.90 1550-1560 (−) 1 0.93 homeobox protein P$LREM Lightresponsive P$RAP22.01 0.85 1555-1565 (−) 1 0.882 element motif, notmodulated by different light qualities P$NCS1 Nodulin consensusP$NCS1.01 0.85 1559-1569 (−) 0.8 0.855 sequence 1 P$GARP Myb-relatedP$ARR10.01 0.97 1560-1568 (+) 1 0.97 DNA binding proteins (Golden2, ARR,Psr) P$IDDF ID domain factors P$ID1.01 0.92 1563-1575 (+) 1 0.952 P$NCS2Nodulin consensus P$NCS2.01 0.79 1565-1579 (+) 0.75 0.845 sequence 2O$VTBP Vertebrate TA- O$MTATA.01 0.84 1570-1586 (+) 1 0.846 TA bindingprotein factor P$DOFF DNA binding P$PBF.01 0.97 1571-1587 (+) 1 0.988with one finger (DOF) P$LEGB Legumin Box P$RY.01 0.87 1572-1598 (−) 10.898 family P$NCS2 Nodulin consensus P$NCS2.01 0.79 1610-1624 (+) 10.867 sequence 2 P$MADS MADS box P$AGL3.01 0.83 1637-1657 (+) 1 0.851proteins P$GTBX GT-box elements P$GT3A.01 0.83 1652-1668 (−) 1 0.854P$MYBL MYB-like proteins P$NTMYBAS1.01 0.96 1654-1670 (−) 1 0.971 P$AHBPArabidopsis P$HAHB4.01 0.87 1671-1681 (+) 1 0.934 homeobox proteinP$OCSE Enhancer element P$OCSL.01 0.69 1677-1697 (+) 1 0.763 firstidentified in the promoter of the octopine synthase gene (OCS) of theAgrobacterium tumefaciens T- DNA P$GBOX Plant G-box/C- P$GBF1.01 0.941682-1702 (−) 1 0.968 box bZIP proteins P$ABRE ABA response P$ABRE.010.82 1685-1701 (−) 1 0.855 elements P$BRRE Brassinosteroid P$BZR1.010.95 1696-1712 (−) 1 0.954 (BR) response element P$GBOX Plant G-box/C-P$GBF1.01 0.94 1696-1716 (−) 1 0.963 box bZIP proteins P$TEFB TEF-boxP$TEF1.01 0.76 1696-1716 (−) 0.84 0.799 P$DPBF Dc3 promoter P$DPBF.010.89 1700-1710 (+) 1 0.943 binding factors P$EREF Ethylen responeP$ANT.01 0.81 1701-1717 (+) 1 0.862 element factors P$LEGB Legumin BoxP$RY.01 0.87 1701-1727 (−) 1 0.925 family P$LEGB Legumin Box P$RY.010.87 1704-1730 (+) 1 0.967 family P$LEGB Legumin Box P$IDE1.01 0.771708-1734 (+) 1 0.888 family P$MADS MADS box P$MADS.01 0.75 1722-1742(+) 1 0.758 proteins P$MYBS MYB proteins P$TAMYB80.01 0.83 1727-1743 (+)1 0.861 with single DNA binding repeat P$URNA Upstream sequence P$USE.010.75 1731-1747 (+) 1 0.77 element of U- snRNA genes P$ROOT Root hair-P$RHE.02 0.77 1740-1764 (+) 1 0.79 specific cis- elements in angiospermsP$GBOX Plant G-box/C- P$EMBP1.01 0.84 1747-1767 (−) 1 0.84 box bZIPproteins P$ABRE ABA response P$ABRE.01 0.82 1750-1766 (−) 1 0.831elements O$VTBP Vertebrate TA- O$VTATA.01 0.90 1756-1772 (+) 1 0.957 TAbinding protein factor P$MYBL MYB-like proteins P$MYBPH3.02 0.761765-1781 (−) 1 0.781

1.2 Vector Construction

Using the Multisite Gateway System (Invitrogen, Carlsbad, Calif., USA),promoter::reporter-gene cassettes were assembled into binary constructsfor plant transformation. beta-Glucuronidase (GUS) or uidA gene whichencodes an enzyme for which various chromogenic substrates are known,was utilized as reporter protein for determining the expression featuresof the permutated p-PvArc5_perm (SEQ ID NO2) and p-VfSBP_perm (SEQ IDNO4) promoter sequences. The DNA fragments representing promotersp-PvArc5_perm (SEQ ID NO2) and p-VfSBP_perm (SEQ ID NO4) were generatedby gene synthesis. Endonucleolytic restriction sites suitable forcloning the promoter fragments into beta-Glucuronidase reporter genecassettes were included in the synthesis. The p-PvArc5_perm (SEQ ID NO2)promoter was cloned into a pENTR/A vector harboring thebeta-Glucuronidase reporter gene c-GUS (with the prefix c-denotingcoding sequence) followed by the t-PvArc (with the prefix t-denotingterminator) transcription terminafor sequence using restrictionendonucleases FseI and NcoI, yielding construct LJB2012. Similarly, thep-VfSBP_perm (SEQ ID NO4) promoter was cloned into a pENTR/B vectorharboring the beta-Glucuronidase reporter gene c-GUS followed by thet-StCatpA transcriptional terminator sequence using restrictionendonucleases FseI and NcoI, yielding construct LJB2007.

The complementary pENTR vectors without any expression cassettes wereconstructed by introduction of a multiple cloning site via KpnI andHindIII restriction sites. By performing a site specific recombination(LR-reaction), the created pENTR/A, pENTR/B and pENTR/C were combinedwith the pSUN destination vector (pSUN derivative) according to themanufacturers (Invitrogen, Carlsbad, Calif., USA) Multisite Gatewaymanual. The reactions yielded a binary vector with the p-PvArc5_perm(SEQ ID NO2) promoter, the beta-Glucuronidase coding sequence c-GUS andthe t-PvArc terminator, for which the full construct sequence is given(SEQ ID NO7). Accordingly, a binary vector with the p-VfSBP_perm (SEQ IDNO4) promoter, the beta-Glucuronidase reporter gene and the t-StCatpAterminator for which the full construct sequence is given (SEQ ID NO8).The resulting plant transformation vectors are summarized in table 5:

TABLE 5 Plant expression vectors for B. napus transformation plantComposition of the expression cassette SEQ expression vectorPromoter::reporter gene::terminator ID NO LJB2045p-PvArc5_perm::c-GUS::t-PvArc 7 LJB2043 p-VfSBP_perm::c-GUS::t-StCatpA 8

1.3 Generation of Transgenic Rapeseed Plants (Amended Protocol Accordingto Moloney et al., 1992, Plant Cell Reports, 8: 238-242)

In preparation for the generation of transgenic rapeseed plants, thebinary vectors were trans-formed into Agrobacterium tumefaciensC58C1:pGV2260 (Deblaere et al., 1985, Nucl. Acids. Res. 13: 4777-4788).A 1:50 dilution of an overnight culture of Agrobacteria harboring therespective binary construct was grown in Murashige-Skoog Medium(Murashige and Skoog, 1962, Physiol. Plant 15, 473) supplemented with 3%saccharose (3MS-Medium). For the transformation of rapeseed plants,petioles or hypocotyledons of sterile plants were incubated with a 1:50Agrobacterium solution for 5-10 minutes followed by a three-dayco-incubation in darkness at 25° C. on 3 MS. Medium supplemented with0.8% bacto-agar. After three days, the explants were transferred toMS-medium containing 500 mg/l Claforan (Cefotaxime-Sodium), 100 nMlmazetapyr, 20 microM Benzylaminopurin (BAP) and 1.6 g/l Glucose in a 16h light/8 h darkness light regime, which was repeated in weekly periods.Growing shoots were transferred to MS-Medium containing 2% saccharose,250 mg/l Claforan and 0.8% Bacto-agar. After 3 weeks, the growth hormone2-Indolbutyl acid was added to the medium to promote root formation.Shoots were transferred to soil following root development, grown fortwo weeks in a growth chamber and grown to maturity in greenhouseconditions.

Example 2 Expression Profile of the p-PvArc5_Perm and p-VfSBP_Perm GeneControl Elements

To demonstrate and analyze the transcription regulating properties of apromoter, it is useful to operably link the promoter or its fragments toa reporter gene, which can be employed to monitor its expression bothqualitatively and quantitatively. Preferably bacterial β-glucuronidaseis used (Jefferson 1987). β-glucuronidase activity can be monitored inplanta with chromogenic substrates such as5-bromo-4-Chloro-3-indolyl-β-D-glucuronic acid during correspondingactivity assays (Jefferson 1987). For determination of promoter activityand tissue specificity, plant tissue is dissected, stained and analyzedas described (e.g., Bäumlein 1991).

The regenerated transgenic T0 rapeseed plants harboring single or doubleinsertions of the transgene deriving from constructs LJB2043 or LJB2045were used for reporter gene analysis.

Table 6 summarizes the reporter gene activity observed in plantsharboring transgenes containing SEQ ID NO2 and SEQ ID NO4 in constructsLJB2043 and LJB2045, respectively:

TABLE 6 beta-Glucuronidase reporter gene activity in selected rapeseedplants harboring transgenes with SEQ ID NO2 (p-PvARC5-perm) and SEQ IDNO4 (p-VfSBP-perm) compared to the GUS expression derived from therespective starting sequence in rapeseed (p-VfSBP) or Phaseolus andArabidopsis plants (p-PvArc5). LJB2043 LJB2045 p-VfSBP- p- Tissue permp-VfSBP PvArc5_perm p-PvArc5* leaves negative negative negative negativestem negative negative negative negative roots negative negativenegative negative flower negative negative negative negative silique(without seed) negative not negative not assayed analyzed embryo (early)weak weak strong strong, no embryo (young) weak weak strong seperateembryo (medium) strong strong strong analyses of different stages embryo(mature) strong strong strong strong seed shell weak not strong stronganalyzed *expression in Phaseolus and Arabidopsis according to Goossenset al.

The gene expression activity conferred by p-PvArc5 perm and p-VfSBP_permis shown exemplary in FIG. 1 (p-PvArc5_perm) and in FIG. 2(P-VfSBP_perm).

General results for SEQ ID NO2: Strong GUS expression was detected inall stages of embryo development and in seed shells. No activity wasfound in other tissues analyzed.

General results for SEQ ID NO4: Weak GUS expression was detected inearly and young embryo stages, strong GUS expression could be observedin medium and mature embryos. Weak expression was monitored in seedshells. No activity was found in other tissues investigated.

Example 3 3.1 Random Permutation of the Promoter Sequence

Using publicly available data, a promoter showing seed specificexpression in plants was selected for analyzing the effects of sequencepermutation in periodic intervals throughout the full length of thepromoter DNA sequence. The wild type sequences of the Brassica napusp-BnNapin promoter was analyzed and annotated for the occurrence ofcis-regulatory elements using available literature data (Ellerstrom etal., Ericson et al., Ezcurra et al.). In the following, the DNA sequenceof the promoter was permutated in the region of −1000 to +1 nucleotideswith the following criteria to yield p-BnNapin_perm (SEQ ID NO6): DNApermutation was conducted in a way to not affect cis regulatory elementswhich have been proven previously to be essential for seed specific geneexpression and motives essential for gene expression. The remainingpromoter sequence was randomly permutated resulting in a promotersequence with an overall nucleotide homology of 75% to the initialp-BnNapin sequence

3.2 Vector Construction

Using the Multisite Gateway System (Invitrogen, Carlsbad, Calif., USA),promoter::reporter-gene cassettes were assembled into binary constructsfor plant transformation. Beta-Glucuronidase (GUS) or uidA gene whichencodes an enzyme for which various chromogenic substrates are known,was utilized as reporter protein for determining the expression featuresof the permutated p-BnNapin_perm (SEQ ID NO6) promoter sequences.

The DNA fragments representing promoter p-BnNapin_perm was generated bygene synthesis. Endonucleolytic restriction sites suitable for cloningthe promoter fragment into a beta-Glucuronidase reporter gene cassettewas included in the synthesis. p-BnNapin_perm (SEQ ID NO6) promoter wascloned into a pENTR/A vector harboring the beta-Glucuronidase reportergene c-GUS (with the prefix c-denoting coding sequence) followed by thet-nos transcription terminator sequence using restriction endonucleasesBamHI and NcoI, yielding pENTR/A LLL1168.

A 1138 bp DNA fragment representing the native promoter p-BnNapin (SEQID NO5) was generated by PCR with the following primers.

SEQ ID NO 11 Loy963 GATATAGGTACCTCTTCATCGGTGATTGATTCCT SEQ ID NO 12Loy964 GATATACCATGGTCGTGTATGTTTTTAATCTTGTTTG

Endonucleolytic restriction sites suitable for cloning the promoterfragment into a beta-Glucuronidase reporter gene cassette were includedin the primers. p-BnNapin (SEQ ID NO5) promoter was cloned into apENTR/A vector harboring the beta-Glucuronidase reporter gene c-GUS(with the prefix c-denoting coding sequence) followed by the t-nostranscription terminator sequence using restriction endonucleases KpnIand NcoI, yielding pENTR/A LLL1166.

By performing a site specific recombination (LR-reaction), the newlycreated pENTRs/A LLL1168 and LLL1166, were combined with pENTR/B andpENTR/C and the pSUN destination vector (pSUN derivative) according tothe manufacturers (Invitrogen, Carlsbad, Calif., USA) Multisite Gatewaymanual. The reaction yielded binary vector LLL 1184 with thep-BnNapin_perm (SEQ ID NO6) promoter, the beta-Glucuronidase codingsequence c-GUS and the t-nos terminator, and binary vector LLL 1176 withthe native p-BnNapin (SEQ ID NO5) promoter, the beta-Glucuronidasecoding sequence c-GUS and the t-nos terminator. For both vectors thefull construct sequence is given (SEQ ID NO9 and 10). The resultingplant transformation vectors are shown in table 7:

TABLE 7 Plant expression vectors for A. thaliana transformation plantComposition of the expression cassette SEQ expression vectorPromoter::reporter gene::terminator ID NO LLL1184p-BnNapin_perm::c-GUS::t-nos 9 LLL1176 p-BnNapin::c-GUS::t-nos 10

3.3 Generation of Arabidopsis thaliana plants

A. thaliana plants were grown in soil until they flowered. Agrobacteriumtumefaciens (strain C58C1 [pMP90]) transformed with the construct ofinterest was grown in 500 mL in liquid YEB medium (5 g/L Beef extract, 1g/L Yeast Extract (Duchefa), 5 g/L Peptone (Duchefa), 5 g/L sucrose(Duchefa), 0.49 g/L MgSO₄ (Merck)) until the culture reached an OD₆₀₀0.8-1.0. The bacterial cells were harvested by centrifugation (15minutes, 5,000 rpm) and resuspended in 500 mL infiltration solution (5%sucrose, 0.05% SILWET L-77 [distributed by Lehle seeds, Cat. No.VIS-02]). Flowering plants were dipped for 10-20 seconds into theAgrobacterium solution. Afterwards the plants were kept in the dark forone day and then in the greenhouse until seeds could be harvested.Transgenic seeds were selected on soil by spraying the seeds directlyafter sowing with a solution of 0.016 g/l Imazamox. After 12 to 14 dayssurviving plants were transferred to pots and grown in the greenhouse.

Example 4 Expression Profile of the Native p-Bn-Napin and thep-BnNapin_Perm Gene Control Elements

To demonstrate and analyze the transcription regulating properties of apromoter, it is useful to operably link the promoter or its fragments toa reporter gene, which can be employed to monitor its expression bothqualitatively and quantitatively. Preferably bacterial R-glucuronidaseis used (Jefferson 1987). β-glucuronidase activity can be monitored inplanta with chromogenic substrates such as5-bromo-4-Chloro-3-indolyl-R-D-glucuronic acid during correspondingactivity assays (Jefferson 1987). For determination of promoter activityand tissue specificity, plant tissue is dissected, stained and analyzedas described (e.g., Baumlein 1991).

The regenerated transgenic T0 Arabidopsis plants harboring single ordouble insertions of the transgene deriving from constructs LLL1184 (SEQID NO9) and constructs LLL1176 (SEQ ID NO10) were used for reporter geneanalysis. Table 8 summarizes the reporter gene activity observed inplants harboring transgenes containing SEQ ID NO9 and SEQ ID NO10 inconstructs LLL1184 and LLL1176, respectively:

TABLE 8 beta-Glucuronidase reporter gene activity in selectedArabidopsis plants harboring transgenes with SEQ ID NO 9 or 10respectively. Tissue LLL1176 LLL1184 leaves negative negative Stemnegative negative Roots negative negative Flower negative negativeSilique weak weak Embryo (medium) strong strong Embryo (mature) strongstrong

The gene expression activity conferred by pBn-Napin and p-BNapin_perm isshown exemplary in FIG. 3 (p-Bn_napin SEQ ID NO5, p-BnNapin_perm SEQ IDNO6)

General results for SEQ ID NO5 and 6: For both promoters pBn-Napin andp-BNapin_perm strong GUS expression was detected in medium to maturestages of embryo development. Weak expression was monitored in seedshells and in siliques. No activity was found in other tissues analyzed.

Example 5 Directed Permutation of a Constitutive Promoter Sequence

Using publicly available data, one promoters showing constitutiveexpression in plants was selected (de Pater, B. S., van der Mark, F.,Rueb, S., Katagiri, F., Chua, N. H., Schilperoort, R. A. and Hensgens,L. A. (1992) The promoter of the rice gene GOS2 is active in variousdifferent monocot tissues and binds rice nuclear factor ASF-1 Plant J. 2(6)) for analyzing the effects of sequence permutation in periodicintervals throughout the full length of the promoter DNA sequence. Thewildtype or starting sequence of the Oryza sativa p-GOS2 (SEQ ID NO 13)(with the prefix p-denoting promoter) promoter was analyzed andannotated for the occurrence of motives, boxes, cis-regulatory elementsusing e.g. the GEMS Launcher Software (www.genomatix.de) as describedabove in example 1.

The promoter p-Gos2 encompasses a 5′UTR sequence with an internalintron. To ensure correct splicing of the intron after permutation,splice sites and putative branching point were not altered. Nonucleotide exchanges were introduced into sequences 10 bp up- anddownstream of the splice site (5′ GT; 3′ CAG) and “TNA” sequenceelements within the last 100 base pairs of the original p-Gos2 werepreserved after permutation.

In the following, the DNA sequence of the promoter was permutatedaccording to the method of the invention to yield p-GOS2_perm1 andp-GOS2_perm2 respectively (SEQ ID NO 14 and 15).

The list of motives, boxes, cis regulatory elements in the p-GOS2promoters before and after the permutation are shown in Table 9 for thestarting sequence of p-GOS2, Table 10 for the p-GOS2_perm1 (SEQ ID NO14) and Table 11 for the p-GOS2_perm2 sequence (SEQ ID NO 15).

Empty lines resemble motives, boxes, cis regulatory elements not foundin one sequence but present in the corresponding sequence, hence,motives, boxes, cis regulatory elements that were deleted from thestarting sequence or that were introduced into the permutated sequence.

TABLE 9 Boxes and Motifs identified in the starting sequence of thep-GOS2 promoter Position Core Matrix p-GOS2 Position sim. sim. FamilyFurther Family Information Matrix Opt. from-to — — P$NCS1 Nodulinconsensus sequence 1 P$NCS1.01 0.85 6 16 1 0.857 P$MSAE M-phase-specificactivator P$MSA.01 0.8 15 29 1 0.832 elements P$MYBL MYB-like proteinsP$GAMYB.01 0.91 29 45 1 0.927 P$MYBL MYB-like proteins P$WER.01 0.87 3349 1 0.897 P$MADS MADS box proteins P$AGL2.01 0.82 35 55 0.79 0.82P$NACF Plant specific NAC [NAM P$IDEF2.01 0.96 48 60 1 0.96 (no apicalmeristem), ATAF172, CUC2 (cup- shaped cotyledons 2)] transcriptionfactors P$BRRE Brassinosteroid (BR) response P$BZR1.01 0.95 48 64 10.954 element O$PTBP Plant TATA binding protein O$PTATA.01 0.88 60 74 10.883 factor O$VTBP Vertebrate TATA binding O$VTATA.01 0.9 61 77 1 0.961protein factor O$INRE Core promoter initiator elements O$DINR.01 0.94 6575 0.97 0.94 O$VTBP Vertebrate TATA binding O$LTATA.01 0.82 69 85 10.842 protein factor O$VTBP Vertebrate TATA binding O$VTATA.01 0.9 71 870.89 0.921 protein factor O$YTBP Yeast TATA binding protein O$SPT15.010.83 74 90 1 0.832 factor O$YTBP Yeast TATA binding protein O$SPT15.010.83 75 91 1 0.876 factor O$VTBP Vertebrate TATA binding O$ATATA.01 0.7876 92 0.75 0.781 protein factor O$YTBP Yeast TATA binding proteinO$SPT15.01 0.83 77 93 0.76 0.835 factor P$MIIG MYB IIG-type bindingsites P$PALBOXL.01 0.8 118 132 0.77 0.841 P$DOFF DNA binding with onefinger P$DOF1.01 0.98 126 142 1 0.99 (DOF) P$DOFF DNA binding with onefinger P$PBF.01 0.97 149 165 1 0.989 (DOF) P$WNAC Wheat NAC-domaintranscription P$TANAC69.01 0.68 170 192 0.81 0.712 factors O$VTBPVertebrate TATA binding O$ATATA.01 0.78 187 203 1 0.922 protein factorP$E2FF E2F-homolog cell cycle P$E2F.01 0.82 193 207 1 0.829 regulatorsO$INRE Core promoter initiator elements O$DINR.01 0.94 200 210 0.970.945 P$AHBP Arabidopsis homeobox P$ATHB5.01 0.89 207 217 0.83 0.903protein P$AHBP Arabidopsis homeobox P$HAHB4.01 0.87 207 217 1 0.967protein P$CNAC Calcium regulated NAC- P$CBNAC.02 0.85 215 235 1 0.947factors P$MYBS MYB proteins with single P$PHR1.01 0.84 217 233 1 0.944DNA binding repeat P$OCSE Enhancer element first P$OCSL.01 0.69 216 2361 0.722 identified in the promoter of the octopine synthase gene (OCS)of the Agrobacterium tumefaciens T-DNA P$MYBS MYB proteins with singleP$PHR1.01 0.84 222 238 1 0.979 DNA binding repeat P$GTBX GT-box elementsP$SBF1.01 0.87 246 262 1 0.901 P$STKM Storekeeper motif P$STK.01 0.85251 265 1 0.85 P$AHBP Arabidopsis homeobox P$ATHB5.01 0.89 254 264 0.830.904 protein P$AHBP Arabidopsis homeobox P$BLR.01 0.9 254 264 1 0.998protein P$HEAT Heat shock factors P$HSFA1A.01 0.75 284 300 1 0.757P$CCAF Circadian control factors P$CCA1.01 0.85 297 311 1 0.953 P$LFYBLFY binding site P$LFY.01 0.93 318 330 0.91 0.945 P$GAGA GAGA elementsP$BPC.01 1 329 353 1 1 P$CCAF Circadian control factors P$EE.01 0.84 335349 0.75 0.865 P$GAGA GAGA elements P$BPC.01 1 331 355 1 1 P$CCAFCircadian control factors P$CCA1.01 0.85 337 351 1 0.968 P$GTBX GT-boxelements P$SBF1.01 0.87 341 357 1 0.875 P$MADS MADS box proteinsP$SQUA.01 0.9 345 365 1 0.925 P$CCAF Circadian control factors P$EE.010.84 363 377 1 0.925 O$VTBP Vertebrate TATA binding O$MTATA.01 0.84 383399 1 0.895 protein factor P$CARM CA-rich motif P$CARICH.01 0.78 388 4061 0.785 P$AHBP Arabidopsis homeobox P$HAHB4.01 0.87 397 407 1 0.902protein O$VTBP Vertebrate TATA binding O$LTATA.01 0.82 395 411 1 0.889protein factor O$VTBP Vertebrate TATA binding O$LTATA.01 0.82 396 412 10.844 protein factor O$PTBP Plant TATA binding protein O$PTATA.01 0.88398 412 1 0.892 factor O$VTBP Vertebrate TATA binding O$ATATA.01 0.78397 413 0.75 0.781 protein factor P$AHBP Arabidopsis homeobox P$HAHB4.010.87 400 410 1 0.902 protein O$VTBP Vertebrate TATA binding O$ATATA.010.78 402 418 0.75 0.781 protein factor O$VTBP Vertebrate TATA bindingO$VTATA.02 0.89 405 421 1 0.983 protein factor O$PTBP Plant TATA bindingprotein O$PTATA.02 0.9 408 422 1 0.917 factor P$OCSE Enhancer elementfirst P$OCSTF.01 0.73 426 446 1 0.784 identified in the promoter of theoctopine synthase gene (OCS) of the Agrobacterium tumefaciens T-DNAP$AHBP Arabidopsis homeobox P$HAHB4.01 0.87 440 450 1 0.926 proteinP$AHBP Arabidopsis homeobox P$WUS.01 0.94 444 454 1 1 protein P$OPAQOpaque-2 like transcriptional P$O2_GCN4.01 0.81 447 463 1 0.819activators P$SEF4 Soybean embryo factor 4 P$SEF4.01 0.98 472 482 1 0.984P$GTBX GT-box elements P$SBF1.01 0.87 481 497 1 0.922 P$DOFF DNA bindingwith one finger P$DOF1.01 0.98 482 498 1 0.994 (DOF) P$GTBX GT-boxelements P$SBF1.01 0.87 482 498 1 0.9 P$WBXF W Box family P$WRKY11.010.94 493 509 1 0.963 P$SEF4 Soybean embryo factor 4 P$SEF4.01 0.98 504514 1 0.994 P$IBOX Plant I-Box sites P$GATA.01 0.93 509 525 1 0.961P$NCS1 Nodulin consensus sequence 1 P$NCS1.01 0.85 515 525 1 0.948P$GTBX GT-box elements P$S1F.01 0.79 518 534 0.75 0.793 P$LREM Lightresponsive element P$RAP22.01 0.85 527 537 1 0.897 motif, not modulatedby different light qualities P$L1BX L1 box, motif for L1 layer-P$ATML1.01 0.82 525 541 0.75 0.825 specific expression O$VTBP VertebrateTATA binding O$ATATA.01 0.78 539 555 0.75 0.782 protein factor P$ROOTRoot hair-specific cis- P$RHE.01 0.77 568 592 0.75 0.772 elements inangiosperms P$ABRE ABA response elements P$ABRE.01 0.82 591 607 1 0.837P$ASRC AS1/AS2 repressor complex P$AS1_AS2_II.01 0.86 599 607 1 0.867P$L1BX L1 box, motif for L1 layer- P$HDG9.01 0.77 629 645 1 0.89specific expression P$L1BX L1 box, motif for L1 layer- P$HDG9.01 0.77631 647 0.8 0.783 specific expression P$L1BX L1 box, motif for L1 layer-P$ATML1.01 0.82 638 654 1 0.877 specific expression P$CCAF Circadiancontrol factors P$EE.01 0.84 649 663 1 0.899 P$DOFF DNA binding with onefinger P$PBF.01 0.97 687 703 1 0.987 (DOF) P$GTBX GT-box elementsP$SBF1.01 0.87 689 705 1 0.888 P$AHBP Arabidopsis homeobox P$BLR.01 0.9695 705 1 0.929 protein P$CCAF Circadian control factors P$EE.01 0.84694 708 1 0.954 P$LREM Light responsive element P$RAP22.01 0.85 701 7111 1 motif, not modulated by different light qualities O$VTBP VertebrateTATA binding O$ATATA.01 0.78 699 715 0.75 0.822 protein factor P$HMGFHigh mobility group factors P$HMG_IY.01 0.89 711 725 1 0.929 P$AHBPArabidopsis homeobox P$ATHB1.01 0.9 716 726 0.79 0.901 protein P$AHBPArabidopsis homeobox P$BLR.01 0.9 716 726 1 0.998 protein O$VTBPVertebrate TATA binding O$VTATA.02 0.89 716 732 1 0.893 protein factorP$SUCB Sucrose box P$SUCROSE.01 0.81 715 733 1 0.856 P$DOFF DNA bindingwith one finger P$PBOX.01 0.75 718 734 0.76 0.762 (DOF) P$HEAT Heatshock factors P$HSE.01 0.81 718 734 1 0.833 P$GAPB GAP-Box (lightresponse P$GAP.01 0.88 733 747 1 0.924 elements) P$MYBL MYB-likeproteins P$MYBPH3.02 0.76 744 760 0.78 0.834 O$VTBP Vertebrate TATAbinding O$ATATA.01 0.78 754 770 0.75 0.831 protein factor P$TELO Telobox (plant interstitial P$ATPURA.01 0.85 756 770 0.75 0.869 telomeremotifs) P$MYCL Myc-like basic helix-loop- P$OSBHLH66.01 0.85 789 807 10.851 helix binding factors P$BRRE Brassinosteroid (BR) responseP$BZR1.01 0.95 793 809 1 0.998 element P$URNA Upstream sequence elementP$USE.01 0.75 812 828 0.75 0.797 of U-snRNA genes P$MADS MADS boxproteins P$AGL1.01 0.84 812 832 1 0.895 P$MADS MADS box proteinsP$AGL1.01 0.84 813 833 0.92 0.911 P$NCS1 Nodulin consensus sequence 1P$NCS1.01 0.85 872 882 0.81 0.888 P$LREM Light responsive elementP$RAP22.01 0.85 879 889 1 0.896 motif, not modulated by different lightqualities P$MSAE M-phase-specific activator P$MSA.01 0.8 880 894 1 0.877elements P$MYBL MYB-like proteins P$NTMYBAS1.01 0.96 900 916 0.95 0.968P$GTBX GT-box elements P$SBF1.01 0.87 909 925 1 0.905 P$MYBL MYB-likeproteins P$AS1_AS2_I.01 0.99 911 927 1 1 P$LREM Light responsive elementP$RAP22.01 0.85 981 991 1 0.893 motif, not modulated by different lightqualities O$PTBP Plant TATA binding protein O$PTATA.02 0.9 982 996 10.951 factor P$L1BX L1 box, motif for L1 layer- P$PDF2.01 0.85 982 998 10.884 specific expression O$VTBP Vertebrate TATA binding O$VTATA.01 0.9983 999 1 0.955 protein factor P$MADS MADS box proteins P$AGL15.01 0.791006 1026 0.83 0.793 P$MYBS MYB proteins with single P$ZMMRP1.01 0.791008 1024 0.78 0.811 DNA binding repeat O$PTBP Plant TATA bindingprotein O$PTATA.02 0.9 1010 1024 1 0.91 factor P$CGCG Calmodulinbinding/ P$ATSR1.01 0.84 1051 1067 1 0.859 CGCG box binding proteinsP$ABRE ABA response elements P$ABF1.01 0.79 1053 1069 1 0.837 P$CE3SCoupling element 3 sequence P$CE3.01 0.77 1052 1070 1 0.893 P$NACF Plantspecific NAC [NAM P$ANAC092.01 0.92 1055 1067 1 0.927 (no apicalmeristem), ATAF172, CUC2 (cup- shaped cotyledons 2)] transcriptionfactors P$DPBF Dc3 promoter binding factors P$DPBF.01 0.89 1057 1067 10.908 P$PREM Motifs of plastid response P$MGPROTORE.01 0.77 1059 1089 10.806 elements O$MTEN Core promoter motif ten O$HMTE.01 0.88 1072 10920.96 0.94 elements P$DREB Dehydration responsive P$HVDRF1.01 0.89 10791093 1 0.922 element binding factors P$PREM Motifs of plastid responseP$MGPROTORE.01 0.77 1077 1107 1 0.805 elements O$MTEN Core promotermotif ten O$DMTE.01 0.77 1097 1117 0.84 0.805 elements P$OPAQ Opaque-2like transcriptional P$O2.02 0.87 1135 1151 1 0.915 activators P$SALTSalt/drought responsive P$ALFIN1.02 0.95 1136 1150 1 0.954 elementsP$L1BX L1 box, motif for L1 layer- P$PDF2.01 0.85 1179 1195 1 0.882specific expression P$SBPD SBP-domain proteins P$SBP.01 0.88 1199 1215 10.912 P$PALA Conserved box A in PAL P$PALBOXA.01 0.84 1201 1219 1 0.863and 4CL gene promoters P$MYBS MYB proteins with single P$ZMMRP1.01 0.791230 1246 1 0.833 DNA binding repeat P$AHBP Arabidopsis homeoboxP$ATHB9.01 0.77 1244 1254 1 0.867 protein P$MADS MADS box proteinsP$AGL2.01 0.82 1248 1268 0.97 0.828 P$MYBS MYB proteins with singleP$MYBST1.01 0.9 1262 1278 1 0.953 DNA binding repeat P$HEAT Heat shockfactors P$HSE.01 0.81 1278 1294 1 0.864 P$LEGB Legumin Box familyP$RY.01 0.87 1277 1303 1 0.871 P$MYBS MYB proteins with singleP$OSMYBS.01 0.82 1343 1359 0.75 0.822 DNA binding repeat O$INRE Corepromoter initiator elements O$DINR.01 0.94 1349 1359 0.97 0.955 P$STKMStorekeeper motif P$STK.01 0.85 1355 1369 1 0.95 P$GTBX GT-box elementsP$GT1.01 0.85 1403 1419 0.97 0.865 O$VTBP Vertebrate TATA bindingO$ATATA.01 0.78 1439 1455 0.75 0.797 protein factor P$OCSE Enhancerelement first P$OCSL.01 0.69 1437 1457 0.77 0.745 identified in thepromoter of the octopine synthase gene (OCS) of the Agrobacteriumtumefaciens T-DNA P$HEAT Heat shock factors P$HSFA1A.01 0.75 1478 1494 10.764 P$WBXF W Box family P$WRKY.01 0.92 1488 1504 1 0.94 P$TEFB TEF-boxP$TEF1.01 0.76 1491 1511 0.96 0.858 P$MYBS MYB proteins with singleP$HVMCB1.01 0.93 1498 1514 1 0.934 DNA binding repeat P$MYBS MYBproteins with single P$TAMYB80.01 0.83 1509 1525 0.75 0.837 DNA bindingrepeat P$MSAE M-phase-specific activator P$MSA.01 0.8 1551 1565 1 0.802elements P$OPAQ Opaque-2 like transcriptional P$O2.01 0.87 1558 1574 10.883 activators P$AHBP Arabidopsis homeobox P$ATHB5.01 0.89 1569 15790.83 0.904 protein P$AHBP Arabidopsis homeobox P$ATHB5.01 0.89 1569 15790.94 0.978 protein O$VTBP Vertebrate TATA binding O$ATATA.01 0.78 16091625 0.75 0.781 protein factor P$LREM Light responsive elementP$RAP22.01 0.85 1613 1623 1 0.966 motif, not modulated by differentlight qualities P$TEFB TEF-box P$TEF1.01 0.76 1617 1637 0.84 0.812P$WNAC Wheat NAC-domain transcription P$TANAC69.01 0.68 1625 1647 0.90.775 factors P$NACF Plant specific NAC [NAM P$ANAC019.01 0.94 1632 16440.95 0.968 (no apical meristem), ATAF172, CUC2 (cup- shaped cotyledons2)] transcription factors P$GTBX GT-box elements P$S1F.01 0.79 1642 16581 0.917 P$PSRE Pollen-specific regulatory P$GAAA.01 0.83 1644 1660 10.864 elements P$MYBL MYB-like proteins P$MYBPH3.01 0.8 1647 1663 10.938 P$DOFF DNA binding with one finger P$DOF1.01 0.98 1694 1710 1 1(DOF) P$HEAT Heat shock factors P$HSFA1A.01 0.75 1703 1719 0.86 0.757P$CCAF Circadian control factors P$EE.01 0.84 1719 1733 1 0.955 P$MADSMADS box proteins P$AG.01 0.8 1717 1737 0.9 0.816 P$GTBX GT-box elementsP$ASIL1.01 0.93 1732 1748 1 0.967 O$INRE Core promoter initiatorelements O$DINR.01 0.94 1749 1759 1 0.957 P$SUCB Sucrose boxP$SUCROSE.01 0.81 1749 1767 0.75 0.837 P$SUCB Sucrose box P$SUCROSE.010.81 1754 1772 0.75 0.815 P$L1BX L1 box, motif for L1 layer- P$ATML1.020.76 1757 1773 0.89 0.848 specific expression P$AHBP Arabidopsishomeobox P$ATHB9.01 0.77 1761 1771 0.75 0.815 protein O$VTBP VertebrateTATA binding O$VTATA.02 0.89 1777 1793 1 0.996 protein factor P$DOFF DNAbinding with one finger P$DOF3.01 0.99 1778 1794 1 0.995 (DOF) O$PTBPPlant TATA binding protein O$PTATA.02 0.9 1780 1794 1 0.923 factorP$IBOX Plant I-Box sites P$GATA.01 0.93 1787 1803 1 0.967 P$MYBS MYBproteins with single P$MYBST1.01 0.9 1790 1806 1 0.972 DNA bindingrepeat O$VTBP Vertebrate TATA binding O$ATATA.01 0.78 1803 1819 0.750.797 protein factor P$IBOX Plant I-Box sites P$GATA.01 0.93 1847 1863 10.945 P$MYBS MYB proteins with single P$MYBST1.01 0.9 1850 1866 1 0.966DNA binding repeat P$MADS MADS box proteins P$SQUA.01 0.9 1866 1886 10.916 P$GTBX GT-box elements P$SBF1.01 0.87 1872 1888 1 0.905 O$VTBPVertebrate TATA binding O$LTATA.01 0.82 1873 1889 1 0.837 protein factorP$AHBP Arabidopsis homeobox P$HAHB4.01 0.87 1878 1888 1 0.902 proteinP$L1BX L1 box, motif for L1 layer- P$ATML1.01 0.82 1882 1898 0.75 0.824specific expression O$INRE Core promoter initiator elements O$DINR.010.94 1886 1896 0.97 0.949 P$EPFF EPF-type zinc finger factors,P$ZPT22.01 0.75 1887 1909 1 0.774 two canonical Cys2/His2 zinc fingermotifs separated by spacers of various length P$GAPB GAP-Box (lightresponse P$GAP.01 0.88 1907 1921 1 0.903 elements) P$SUCB Sucrose boxP$SUCROSE.01 0.81 1912 1930 1 0.849 P$HMGF High mobility group factorsP$HMG_IY.01 0.89 1920 1934 1 0.892 P$SEF4 Soybean embryo factor 4P$SEF4.01 0.98 1927 1937 1 0.984 P$MYBL MYB-like proteins P$ATMYB77.010.87 1973 1989 1 0.9 P$GTBX GT-box elements P$ASIL1.01 0.93 1998 2014 10.971 P$OPAQ Opaque-2 like transcriptional P$O2_GCN4.01 0.81 2001 2017 10.83 activators P$IBOX Plant I-Box sites P$GATA.01 0.93 2018 2034 10.964 P$MYBS MYB proteins with single P$MYBST1.01 0.9 2021 2037 1 0.957DNA binding repeat P$LREM Light responsive element P$RAP22.01 0.85 20352045 1 0.858 motif, not modulated by different light qualities P$MIIGMYB IIG-type binding sites P$MYBC1.01 0.92 2033 2047 1 0.941 P$HEAT Heatshock factors P$HSFA1A.01 0.75 2041 2057 1 0.792 P$MYBL MYB-likeproteins P$GAMYB.01 0.91 2054 2070 1 0.918 P$GTBX GT-box elementsP$GT1.01 0.85 2056 2072 1 0.876 P$ASRC AS1/AS2 repressor complexP$AS1_AS2_II.01 0.86 2067 2075 1 0.906 P$EINL Ethylen insensitive 3 likeP$TEIL.01 0.92 2098 2106 0.96 0.926 factors O$VTBP Vertebrate TATAbinding O$LTATA.01 0.82 2110 2126 1 0.828 protein factor P$MYBL MYB-likeproteins P$MYBPH3.02 0.76 2110 2126 1 0.807

TABLE 10 Boxes and Motifs identified in the permutated sequence of thep-GOS2_perm1 promoter. Position p-GOS2_perm1 Position Core Matrix FamilyFurther Family Information Matrix Opt. from-to sim. sim. P$NCS1 Nodulinconsensus sequence 1 P$NCS1.01 0.85 6 16 1 0.857 P$MSAE M-phase-specificactivator P$MSA.01 0.8 15 29 1 0.832 elements P$MYBL MYB-like proteinsP$GAMYB.01 0.91 29 45 1 0.92 P$MYBL MYB-like proteins P$WER.01 0.87 3349 1 0.897 P$MADS MADS box proteins P$AGL2.01 0.82 35 55 0.79 0.82P$NACF Plant specific NAC [NAM (no P$IDEF2.01 0.96 48 60 1 0.96 apicalmeristem), ATAF172, CUC2 (cup-shaped cotyledons 2)] transcriptionfactors P$BRRE Brassinosteroid (BR) response P$BZR1.01 0.95 48 64 10.954 element O$PTBP Plant TATA binding protein O$PTATA.01 0.88 60 74 10.887 factor O$VTBP Vertebrate TATA binding O$VTATA.01 0.9 61 77 1 0.961protein factor O$INRE Core promoter initiator elements O$DINR.01 0.94 6575 0.97 0.94 O$VTBP Vertebrate TATA binding O$LTATA.01 0.82 69 85 10.867 protein factor O$VTBP Vertebrate TATA binding O$VTATA.01 0.9 71 870.89 0.92 protein factor O$YTBP Yeast TATA binding protein O$SPT15.010.83 74 90 1 0.832 factor O$YTBP Yeast TATA binding protein O$SPT15.010.83 75 91 1 0.877 factor O$VTBP Vertebrate TATA binding O$ATATA.01 0.7876 92 0.75 0.781 protein factor O$YTBP Yeast TATA binding proteinO$SPT15.01 0.83 77 93 0.76 0.835 factor P$MIIG MYB IIG-type bindingsites P$PALBOXL.01 0.8 118 132 0.77 0.841 P$DOFF DNA binding with onefinger P$DOF1.01 0.98 126 142 1 0.99 (DOF) P$DOFF DNA binding with onefinger P$PBF.01 0.97 149 165 1 0.989 (DOF) P$WNAC Wheat NAC-domaintranscription P$TANAC69.01 0.68 170 192 0.81 0.712 factors O$VTBPVertebrate TATA binding O$ATATA.01 0.78 187 203 1 0.878 protein factorP$E2FF E2F-homolog cell cycle regulators P$E2F.01 0.82 193 207 1 0.826P$AHBP Arabidopsis homeobox protein P$ATHB5.01 0.89 207 217 0.83 0.903P$AHBP Arabidopsis homeobox protein P$HAHB4.01 0.87 207 217 1 0.967P$CNAC Calcium regulated NAC- P$CBNAC.02 0.85 215 235 1 0.937 factorsP$MYBS MYB proteins with single P$PHR1.01 0.84 217 233 1 0.944 DNAbinding repeat P$OCSE Enhancer element first identified P$OCSL.01 0.69216 236 1 0.735 in the promoter of the octopine synthase gene (OCS) ofthe Agrobacterium tumefaciens T-DNA P$MYBS MYB proteins with singleP$PHR1.01 0.84 222 238 1 0.979 DNA binding repeat P$GTBX GT-box elementsP$SBF1.01 0.87 246 262 1 0.901 P$STKM Storekeeper motif P$STK.01 0.85251 265 1 0.85 P$AHBP Arabidopsis homeobox protein P$ATHB5.01 0.89 254264 0.83 0.904 P$AHBP Arabidopsis homeobox protein P$BLR.01 0.9 254 2641 0.998 P$HEAT Heat shock factors P$HSFA1A.01 0.75 284 300 1 0.757P$CCAF Circadian control factors P$CCA1.01 0.85 297 311 1 0.94 P$LFYBLFY binding site P$LFY.01 0.93 318 330 0.91 0.945 P$WBXF W Box familyP$ERE.01 0.89 322 338 1 0.893 P$GAGA GAGA elements P$BPC.01 1 329 353 11 P$CCAF Circadian control factors P$EE.01 0.84 335 349 0.75 0.865P$GAGA GAGA elements P$BPC.01 1 331 355 1 1 P$CCAF Circadian controlfactors P$CCA1.01 0.85 337 351 1 0.968 P$GTBX GT-box elements P$SBF1.010.87 341 357 1 0.875 P$MADS MADS box proteins P$SQUA.01 0.9 345 365 10.925 P$CCAF Circadian control factors P$EE.01 0.84 363 377 1 0.924O$VTBP Vertebrate TATA binding O$MTATA.01 0.84 383 399 1 0.895 proteinfactor P$CARM CA-rich motif P$CARICH.01 0.78 388 406 1 0.8 P$AHBPArabidopsis homeobox protein P$HAHB4.01 0.87 397 407 1 0.902 O$VTBPVertebrate TATA binding O$LTATA.01 0.82 395 411 1 0.889 protein factorO$VTBP Vertebrate TATA binding O$LTATA.01 0.82 396 412 1 0.844 proteinfactor O$PTBP Plant TATA binding protein O$PTATA.01 0.88 398 412 1 0.892factor O$VTBP Vertebrate TATA binding O$ATATA.01 0.78 397 413 0.75 0.781protein factor P$AHBP Arabidopsis homeobox protein P$HAHB4.01 0.87 400410 1 0.902 O$VTBP Vertebrate TATA binding O$ATATA.01 0.78 402 418 0.750.781 protein factor O$VTBP Vertebrate TATA binding O$VTATA.02 0.89 405421 1 0.983 protein factor O$PTBP Plant TATA binding protein O$PTATA.020.9 408 422 1 0.917 factor P$OCSE Enhancer element first identifiedP$OCSTF.01 0.73 426 446 1 0.762 in the promoter of the octopine synthasegene (OCS) of the Agrobacterium tumefaciens T-DNA P$AHBP Arabidopsishomeobox protein P$HAHB4.01 0.87 440 450 1 0.926 P$AHBP Arabidopsishomeobox protein P$WUS.01 0.94 444 454 1 1 P$OPAQ Opaque-2 liketranscriptional P$O2_GCN4.01 0.81 447 463 1 0.819 activators P$SEF4Soybean embryo factor 4 P$SEF4.01 0.98 472 482 1 0.988 P$GTBX GT-boxelements P$SBF1.01 0.87 481 497 1 0.922 P$DOFF DNA binding with onefinger P$DOF1.01 0.98 482 498 1 0.994 (DOF) P$GTBX GT-box elementsP$SBF1.01 0.87 482 498 1 0.9 P$WBXF W Box family P$WRKY11.01 0.94 493509 1 0.957 P$SEF4 Soybean embryo factor 4 P$SEF4.01 0.98 504 514 10.988 P$IBOX Plant I-Box sites P$GATA.01 0.93 509 525 1 0.961 P$NCS1Nodulin consensus sequence 1 P$NCS1.01 0.85 515 525 1 0.948 P$GTBXGT-box elements P$S1F.01 0.79 518 534 0.75 0.793 P$LREM Light responsiveelement P$RAP22.01 0.85 527 537 1 0.897 motif, not modulated bydifferent light qualities P$L1BX L1 box, motif for L1 layer- P$HDG9.010.77 525 541 0.75 0.78 specific expression O$VTBP Vertebrate TATAbinding O$ATATA.01 0.78 539 555 0.75 0.782 protein factor P$ROOT Roothair-specific cis- P$RHE.01 0.77 568 592 0.75 0.772 elements inangiosperms P$ABRE ABA response elements P$ABRE.01 0.82 591 607 1 0.837P$ASRC AS1/AS2 repressor complex P$AS1_AS2_II.01 0.86 599 607 1 0.867P$L1BX L1 box, motif for L1 layer- P$HDG9.01 0.77 629 645 1 0.888specific expression O$VTBP Vertebrate TATA binding O$ATATA.01 0.78 631647 0.75 0.831 protein factor P$L1BX L1 box, motif for L1 layer-P$HDG9.01 0.77 631 647 0.8 0.783 specific expression P$L1BX L1 box,motif for L1 layer- P$PDF2.01 0.85 638 654 1 0.861 specific expressionP$CCAF Circadian control factors P$EE.01 0.84 649 663 1 0.899 P$DOFF DNAbinding with one finger P$PBF.01 0.97 687 703 1 0.987 (DOF) P$GTBXGT-box elements P$SBF1.01 0.87 689 705 1 0.888 P$AHBP Arabidopsishomeobox protein P$BLR.01 0.9 695 705 1 0.929 P$CCAF Circadian controlfactors P$EE.01 0.84 694 708 1 0.954 P$LREM Light responsive elementP$RAP22.01 0.85 701 711 1 0.98 motif, not modulated by different lightqualities P$HMGF High mobility group factors P$HMG_IY.01 0.89 711 725 10.929 P$AHBP Arabidopsis homeobox protein P$ATHB1.01 0.9 716 726 0.790.901 P$AHBP Arabidopsis homeobox protein P$BLR.01 0.9 716 726 1 0.998O$VTBP Vertebrate TATA binding O$VTATA.02 0.89 716 732 1 0.893 proteinfactor P$SUCB Sucrose box P$SUCROSE.01 0.81 715 733 1 0.856 P$DOFF DNAbinding with one finger P$PBOX.01 0.75 718 734 0.76 0.762 (DOF) P$HEATHeat shock factors P$HSE.01 0.81 718 734 1 0.833 P$GAPB GAP-Box (lightresponse P$GAP.01 0.88 733 747 1 0.917 P$MYBL MYB-like proteinsP$MYBPH3.02 0.76 744 760 0.78 0.834 O$VTBP Vertebrate TATA bindingO$ATATA.01 0.78 754 770 0.75 0.831 protein factor P$TELO Telo box (plantinterstitial P$ATPURA.01 0.85 756 770 0.75 0.869 telomere motifs) P$MYCLMyc-like basic helix-loop- P$OSBHLH66.01 0.85 789 807 1 0.851 helixbinding factors P$BRRE Brassinosteroid (BR) response P$BZR1.01 0.95 793809 1 0.998 element P$URNA Upstream sequence element P$USE.01 0.75 812828 0.75 0.797 of U-snRNA genes P$MADS MADS box proteins P$AGL1.01 0.84812 832 1 0.895 P$MADS MADS box proteins P$AGL1.01 0.84 813 833 0.920.911 P$NCS1 Nodulin consensus sequence 1 P$NCS1.01 0.85 872 882 0.810.888 P$LREM Light responsive element P$RAP22.01 0.85 879 889 1 0.896motif, not modulated by different light qualities P$MSAEM-phase-specific activator P$MSA.01 0.8 880 894 1 0.877 elements P$MYBLMYB-like proteins P$NTMYBAS1.01 0.96 900 916 0.95 0.968 P$GTBX GT-boxelements P$SBF1.01 0.87 909 925 1 0.905 P$MYBL MYB-like proteinsP$AS1_AS2_I.01 0.99 911 927 1 1 P$LREM Light responsive elementP$RAP22.01 0.85 981 991 1 0.893 motif, not modulated by different lightqualities O$PTBP Plant TATA binding protein O$PTATA.02 0.9 982 996 10.951 factor P$L1BX L1 box, motif for L1 layer- P$PDF2.01 0.85 982 998 10.884 specific expression O$VTBP Vertebrate TATA binding O$VTATA.01 0.9983 999 1 0.955 protein factor P$MADS MADS box proteins P$AGL15.01 0.791006 1026 0.83 0.8 P$MYBS MYB proteins with single P$ZMMRP1.01 0.79 10081024 0.78 0.811 DNA binding repeat O$PTBP Plant TATA binding proteinO$PTATA.02 0.9 1010 1024 1 0.91 factor P$CGCG Calmodulin binding/CGCGP$ATSR1.01 0.84 1051 1067 1 0.859 box binding proteins P$ABRE ABAresponse elements P$ABF1.01 0.79 1053 1069 1 0.837 P$CE3S Couplingelement 3 sequence P$CE3.01 0.77 1052 1070 1 0.863 P$NACF Plant specificNAC [NAM (no P$ANAC092.01 0.92 1055 1067 1 0.927 apical meristem),ATAF172, CUC2 (cup-shaped cotyledons 2)] transcription factors P$DPBFDc3 promoter binding factors P$DPBF.01 0.89 1057 1067 1 0.908 P$PREMMotifs of plastid response P$MGPROTORE.01 0.77 1059 1089 1 0.806elements O$MTEN Core promoter motif ten elements O$HMTE.01 0.88 10721092 0.96 0.94 P$DREB Dehydration responsive element P$HVDRF1.01 0.891079 1093 1 0.917 binding factors P$PREM Motifs of plastid responseP$MGPROTORE.01 0.77 1077 1107 1 0.807 elements O$MTEN Core promotermotif ten elements O$DMTE.01 0.77 1097 1117 0.84 0.805 P$OPAQ Opaque-2like transcriptional P$O2.02 0.87 1135 1151 1 0.915 activators P$SALTSalt/drought responsive elements P$ALFIN1.02 0.95 1136 1150 1 0.954P$L1BX L1 box, motif for L1 layer- P$PDF2.01 0.85 1179 1195 1 0.882specific expression P$SBPD SBP-domain proteins P$SBP.01 0.88 1199 1215 10.912 P$PALA Conserved box A in PAL and P$PALBOXA.01 0.84 1201 1219 10.863 4CL gene promoters P$MYBS MYB proteins with single P$ZMMRP1.010.79 1230 1246 1 0.833 DNA binding repeat P$AHBP Arabidopsis homeoboxprotein P$ATHB9.01 0.77 1244 1254 1 0.89 P$AHBP Arabidopsis homeoboxprotein P$ATHB9.01 0.77 1244 1254 0.75 0.777 P$MADS MADS box proteinsP$AGL2.01 0.82 1248 1268 0.97 0.835 P$MYBS MYB proteins with singleP$MYBST1.01 0.9 1262 1278 1 0.953 DNA binding repeat P$HEAT Heat shockfactors P$HSE.01 0.81 1278 1294 1 0.864 P$LEGB Legumin Box familyP$RY.01 0.87 1277 1303 1 0.871 P$MYBS MYB proteins with singleP$OSMYBS.01 0.82 1343 1359 0.75 0.822 DNA binding repeat O$INRE Corepromoter initiator elements O$DINR.01 0.94 1349 1359 0.97 0.955 P$STKMStorekeeper motif P$STK.01 0.85 1355 1369 1 0.927 P$GTBX GT-box elementsP$GT1.01 0.85 1403 1419 0.97 0.865 P$OCSE Enhancer element firstidentified P$OCSL.01 0.69 1437 1457 0.77 0.703 in the promoter of theoctopine synthase gene (OCS) of the Agrobacterium tumefaciens T-DNAP$HEAT Heat shock factors P$HSFA1A.01 0.75 1478 1494 1 0.764 P$WBXF WBox family P$ERE.01 0.89 1488 1504 1 0.968 P$TEFB TEF-box P$TEF1.01 0.761491 1511 0.96 0.852 P$MYBS MYB proteins with single P$HVMCB1.01 0.931498 1514 1 0.934 DNA binding repeat P$MYBS MYB proteins with singleP$TAMYB80.01 0.83 1509 1525 0.75 0.837 DNA binding repeat P$MSAEM-phase-specific activator P$MSA.01 0.8 1551 1565 1 0.82 elements P$OPAQOpaque-2 like transcriptional P$O2.01 0.87 1558 1574 1 0.883 activatorsP$AHBP Arabidopsis homeobox protein P$ATHB5.01 0.89 1569 1579 0.83 0.904P$AHBP Arabidopsis homeobox protein P$ATHB5.01 0.89 1569 1579 0.94 0.978O$VTBP Vertebrate TATA binding O$ATATA.01 0.78 1609 1625 0.75 0.781protein factor P$LREM Light responsive element P$RAP22.01 0.85 1613 16231 0.966 motif, not modulated by different light qualities P$TEFB TEF-boxP$TEF1.01 0.76 1617 1637 0.84 0.761 P$WNAC Wheat NAC-domaintranscription P$TANAC69.01 0.68 1625 1647 0.9 0.75 factors P$NACF Plantspecific NAC [NAM (no P$ANAC019.01 0.94 1632 1644 0.95 0.968 apicalmeristem), ATAF172, CUC2 (cup-shaped cotyledons 2)] transcriptionfactors P$GTBX GT-box elements P$S1F.01 0.79 1642 1658 1 0.882 P$PSREPollen-specific regulatory P$GAAA.01 0.83 1644 1660 1 0.864 elementsP$MYBL MYB-like proteins P$MYBPH3.01 0.8 1647 1663 1 0.938 P$DOFF DNAbinding with one finger P$DOF1.01 0.98 1694 1710 1 1 (DOF) P$HEAT Heatshock factors P$HSFA1A.01 0.75 1703 1719 0.86 0.765 P$CCAF Circadiancontrol factors P$EE.01 0.84 1719 1733 1 0.955 P$MADS MADS box proteinsP$AG.01 0.8 1717 1737 0.9 0.816 P$GTBX GT-box elements P$ASIL1.01 0.931732 1748 1 0.98 O$INRE Core promoter initiator elements O$DINR.01 0.941749 1759 1 0.957 P$SUCB Sucrose box P$SUCROSE.01 0.81 1749 1767 0.750.837 P$SUCB Sucrose box P$SUCROSE.01 0.81 1754 1772 0.75 0.815 P$L1BXL1 box, motif for L1 layer- P$ATML1.02 0.76 1757 1773 0.89 0.848specific expression P$AHBP Arabidopsis homeobox protein P$ATHB9.01 0.771761 1771 0.75 0.815 P$MADS MADS box proteins P$AGL3.01 0.83 1768 17880.97 0.838 O$VTBP Vertebrate TATA binding O$VTATA.02 0.89 1777 1793 10.996 protein factor P$DOFF DNA binding with one finger P$DOF3.01 0.991778 1794 1 0.995 (DOF) O$PTBP Plant TATA binding protein O$PTATA.02 0.91780 1794 1 0.923 factor P$IBOX Plant I-Box sites P$GATA.01 0.93 17871803 1 0.967 P$MYBS MYB proteins with single P$MYBST1.01 0.9 1790 1806 10.972 DNA binding repeat O$VTBP Vertebrate TATA binding O$ATATA.01 0.781803 1819 0.75 0.797 protein factor P$IBOX Plant I-Box sites P$GATA.010.93 1847 1863 1 0.945 P$MYBS MYB proteins with single P$MYBST1.01 0.91850 1866 1 0.966 DNA binding repeat P$MADS MADS box proteins P$SQUA.010.9 1866 1886 1 0.916 P$GTBX GT-box elements P$SBF1.01 0.87 1872 1888 10.905 O$VTBP Vertebrate TATA binding O$LTATA.01 0.82 1873 1889 1 0.837protein factor P$AHBP Arabidopsis homeobox protein P$HAHB4.01 0.87 18781888 1 0.902 P$L1BX L1 box, motif for L1 layer- P$ATML1.01 0.82 18821898 0.75 0.824 specific expression O$INRE Core promoter initiatorelements O$DINR.01 0.94 1886 1896 0.97 0.949 P$EPFF EPF-type zinc fingerfactors, P$ZPT22.01 0.75 1887 1909 1 0.752 two canonical Cys2/His2 zincfinger motifs separated by spacers of various length P$GAPB GAP-Box(light response P$GAP.01 0.88 1907 1921 1 0.903 elements) P$SUCB Sucrosebox P$SUCROSE.01 0.81 1912 1930 1 0.849 P$HMGF High mobility groupfactors P$HMG_IY.01 0.89 1920 1934 1 0.892 P$SEF4 Soybean embryo factor4 P$SEF4.01 0.98 1927 1937 1 0.984 P$MYBL MYB-like proteins P$ATMYB77.010.87 1973 1989 1 0.9 P$GTBX GT-box elements P$ASIL1.01 0.93 1998 2014 10.958 P$OPAQ Opaque-2 like transcriptional P$O_GCN4.01 0.81 2001 2017 10.875 activators P$IBOX Plant I-Box sites P$GATA.01 0.93 2018 2034 10.964 P$MYBS MYB proteins with single P$MYBST1.01 0.9 2021 2037 1 0.957DNA binding repeat P$LREM Light responsive element P$RAP22.01 0.85 20352045 1 0.868 motif, not modulated by different light qualities P$MIIGMYB IIG-type binding sites P$MYBC1.01 0.92 2033 2047 1 0.938 P$HEAT Heatshock factors P$HSFA1A.01 0.75 2041 2057 1 0.792 P$MYBL MYB-likeproteins P$GAMYB.01 0.91 2054 2070 1 0.918 P$GTBX GT-box elementsP$GT1.01 0.85 2056 2072 1 0.876 P$ASRC AS1/AS2 repressor complexP$AS1_AS2_II.01 0.86 2067 2075 1 0.906 P$ASRC AS1/AS2 repressor complexP$AS1_AS2_II.01 0.86 2075 2083 1 0.906 P$EINL Ethylen insensitive 3 likefactors P$TEIL.01 0.92 2098 2106 0.96 0.926 O$VTBP Vertebrate TATAbinding O$LTATA.01 0.82 2110 2126 1 0.828 protein factor P$MYBL MYB-likeproteins P$MYBPH3.02 0.76 2110 2126 1 0.807

TABLE 11 Boxes and Motifs identified in the permutated sequence of thep-GOS2_perm2 promoter. Position p-GOS2_perm2 Position Core Matrix FamilyFurther Family Information Matrix Opt. from-to sim. sim. P$NCS1 Nodulinconsensus sequence 1 P$NCS1.01 0.85 6 16 1 0.857 P$MSAE M-phase-specificactivator P$MSA.01 0.8 15 29 1 0.832 elements P$MYBL MYB-like proteinsP$GAMYB.01 0.91 29 45 1 0.95 P$MYBL MYB-like proteins P$WER.01 0.87 3349 1 0.897 P$MADS MADS box proteins P$AGL2.01 0.82 35 55 0.789 0.82P$NACF Plant specific NAC [NAM P$IDEF2.01 0.96 48 60 1 0.96 (no apicalmeristem), ATAF172, CUC2 (cup- shaped cotyledons 2)] transcriptionfactors P$BRRE Brassinosteroid (BR) response P$BZR1.01 0.95 48 64 10.954 element O$PTBP Plant TATA binding protein O$PTATA.01 0.88 60 74 10.883 factor O$VTBP Vertebrate TATA binding O$VTATA.01 0.9 61 77 1 0.961protein factor O$INRE Core promoter initiator elements O$DINR.01 0.94 6575 0.969 0.94 O$VTBP Vertebrate TATA binding O$LTATA.01 0.82 69 85 10.867 protein factor O$VTBP Vertebrate TATA binding O$VTATA.01 0.9 71 870.892 0.92 protein factor O$YTBP Yeast TATA binding protein O$SPT15.010.83 74 90 1 0.832 factor O$YTBP Yeast TATA binding protein O$SPT15.010.83 75 91 1 0.877 factor O$VTBP Vertebrate TATA binding O$ATATA.01 0.7876 92 0.75 0.781 protein factor O$YTBP Yeast TATA binding proteinO$SPT15.01 0.83 77 93 0.755 0.835 factor P$MIIG MYB IIG-type bindingsites P$PALBOXL.01 0.8 118 132 0.768 0.841 P$DOFF DNA binding with onefinger P$DOF1.01 0.98 126 142 1 0.99 (DOF) P$DOFF DNA binding with onefinger P$PBF.01 0.97 149 165 1 0.989 (DOF) P$WNAC Wheat NAC-domaintranscription P$TANAC69.01 0.68 170 192 0.812 0.713 factors O$VTBPVertebrate TATA binding O$ATATA.01 0.78 187 203 1 0.869 protein factorP$E2FF E2F-homolog cell cycle P$E2F.01 0.82 193 207 1 0.829 regulatorsO$INRE Core promoter initiator elements O$DINR.01 0.94 200 210 0.9690.945 P$AHBP Arabidopsis homeobox P$ATHB5.01 0.89 207 217 0.83 0.903protein P$AHBP Arabidopsis homeobox P$HAHB4.01 0.87 207 217 1 0.967protein P$CNAC Calcium regulated NAC- P$CBNAC.02 0.85 215 235 1 0.95factors P$MYBS MYB proteins with single P$PHR1.01 0.84 217 233 1 0.975DNA binding repeat P$OCSE Enhancer element first P$OCSL.01 0.69 216 2361 0.71 identified in the promoter of the octopine synthase gene (OCS) ofthe Agrobacterium tumefaciens T-DNA P$MYBS MYB proteins with singleP$PHR1.01 0.84 222 238 1 0.922 DNA binding repeat P$GTBX GT-box elementsP$SBF1.01 0.87 246 262 1 0.901 P$STKM Storekeeper motif P$STK.01 0.85251 265 1 0.85 P$AHBP Arabidopsis homeobox P$ATHB5.01 0.89 254 264 0.830.904 protein P$AHBP Arabidopsis homeobox P$BLR.01 0.9 254 264 1 0.998protein P$HEAT Heat shock factors P$HSFA1A.01 0.75 284 300 1 0.784P$CCAF Circadian control factors P$CCA1.01 0.85 297 311 1 0.947 P$LFYBLFY binding site P$LFY.01 0.93 318 330 0.914 0.945 P$GAGA GAGA elementsP$BPC.01 1 329 353 1 1 P$CCAF Circadian control factors P$EE.01 0.84 335349 0.75 0.865 P$GAGA GAGA elements P$BPC.01 1 331 355 1 1 P$CCAFCircadian control factors P$CCA1.01 0.85 337 351 1 0.968 P$GTBX GT-boxelements P$SBF1.01 0.87 341 357 1 0.875 P$MADS MADS box proteinsP$SQUA.01 0.9 345 365 1 0.925 P$CCAF Circadian control factors P$EE.010.84 363 377 1 0.925 O$VTBP Vertebrate TATA binding O$MTATA.01 0.84 383399 1 0.91 protein factor P$CARM CA-rich motif P$CARICH.01 0.78 388 4061 0.785 P$AHBP Arabidopsis homeobox P$HAHB4.01 0.87 397 407 1 0.902protein O$VTBP Vertebrate TATA binding O$LTATA.01 0.82 395 411 1 0.889protein factor O$VTBP Vertebrate TATA binding O$LTATA.01 0.82 396 412 10.844 protein factor O$PTBP Plant TATA binding protein O$PTATA.01 0.88398 412 1 0.892 factor O$VTBP Vertebrate TATA binding O$ATATA.01 0.78397 413 0.75 0.781 protein factor P$NCS2 Arabidopsis homeobox P$HAHB4.010.87 400 410 1 0.902 protein P$MSAE Vertebrate TATA binding O$ATATA.010.78 402 418 0.75 0.781 protein factor P$MYBL Vertebrate TATA bindingO$VTATA.02 0.89 405 421 1 0.983 protein factor P$MYBL Plant TATA bindingprotein O$PTATA.02 0.9 408 422 1 0.917 factor P$OCSE Enhancer elementfirst P$OCSTF.01 0.73 426 446 1 0.733 identified in the promoter of theoctopine synthase gene (OCS) of the Agrobacterium tumefaciens T-DNAP$AHBP Arabidopsis homeobox P$HAHB4.01 0.87 440 450 1 0.921 proteinP$AHBP Arabidopsis homeobox P$WUS.01 0.94 444 454 1 1 protein P$OPAQOpaque-2 like transcriptional P$O2_GCN4.01 0.81 447 463 1 0.819activators P$SEF4 Soybean embryo factor 4 P$SEF4.01 0.98 472 482 1 0.987P$GTBX GT-box elements P$SBF1.01 0.87 481 497 1 0.922 P$DOFF DNA bindingwith one finger P$DOF1.01 0.98 482 498 1 0.994 (DOF) P$GTBX GT-boxelements P$SBF1.01 0.87 482 498 1 0.9 P$WBXF W Box family P$WRKY11.010.94 493 509 1 0.957 P$SEF4 Soybean embryo factor 4 P$SEF4.01 0.98 504514 1 0.998 P$IBOX Plant I-Box sites P$GATA.01 0.93 509 525 1 0.986P$NCS1 Nodulin consensus sequence 1 P$NCS1.01 0.85 515 525 1 0.948P$GTBX GT-box elements P$S1F.01 0.79 518 534 0.75 0.793 P$LREM Lightresponsive element P$RAP22.01 0.85 527 537 1 0.897 motif, not modulatedby different light qualities P$L1BX L1 box, motif for L1 layer-P$ATML1.01 0.82 525 541 0.75 0.825 specific expression O$VTBP VertebrateTATA binding O$ATATA.01 0.78 539 555 0.75 0.782 protein factor P$ROOTRoot hair-specific cis- P$RHE.01 0.77 568 592 0.75 0.787 elements inangiosperms P$ABRE ABA response elements P$ABRE.01 0.82 591 607 1 0.837P$ASRC AS1/AS2 repressor complex P$AS1_AS2_II.01 0.86 599 607 1 0.867P$L1BX L1 box, motif for L1 layer- P$HDG9.01 0.77 629 645 1 0.883specific expression P$L1BX L1 box, motif for L1 layer- P$HDG9.01 0.77631 647 0.797 0.776 specific expression P$L1BX L1 box, motif for L1layer- P$ATML1.01 0.82 638 654 1 0.886 specific expression P$CCAFCircadian control factors P$EE.01 0.84 649 663 1 0.891 P$DOFF DNAbinding with one finger P$PBF.01 0.97 687 703 1 0.987 (DOF) P$GTBXGT-box elements P$SBF1.01 0.87 689 705 1 0.888 P$AHBP Arabidopsishomeobox P$BLR.01 0.9 695 705 1 0.929 protein P$CCAF Circadian controlfactors P$EE.01 0.84 694 708 1 0.954 P$LREM Light responsive elementP$RAP22.01 0.85 701 711 1 1 motif, not modulated by different lightqualities P$MADS MADS box proteins P$RIN.01 0.77 699 719 1 0.776 P$HMGFHigh mobility group factors P$HMG_IY.01 0.89 711 725 1 0.924 P$AHBPArabidopsis homeobox P$ATHB1.01 0.9 716 726 0.789 0.901 protein P$AHBPArabidopsis homeobox P$BLR.01 0.9 716 726 1 0.998 protein O$VTBPVertebrate TATA binding O$VTATA.02 0.89 716 732 1 0.893 protein factorP$SUCB Sucrose box P$SUCROSE.01 0.81 715 733 1 0.856 P$DOFF DNA bindingwith one finger P$PBOX.01 0.75 718 734 0.761 0.762 (DOF) P$HEAT Heatshock factors P$HSE.01 0.81 718 734 1 0.833 P$GAPB GAP-Box (lightresponse P$GAP.01 0.88 733 747 1 0.885 elements) P$MYBL MYB-likeproteins P$MYBPH3.02 0.76 744 760 0.779 0.834 O$VTBP Vertebrate TATAbinding O$ATATA.01 0.78 754 770 0.75 0.831 protein factor P$TELO Telobox (plant interstitial P$ATPURA.01 0.85 756 770 0.75 0.869 telomeremotifs) P$MYCL Myc-like basic helix-loop- P$OSBHLH66.01 0.85 789 807 10.851 helix binding factors P$BRRE Brassinosteroid (BR) responseP$BZR1.01 0.95 793 809 1 0.998 element P$URNA Upstream sequence elementP$USE.01 0.75 812 828 0.75 0.797 of U-snRNA genes P$MADS MADS boxproteins P$AGL1.01 0.84 812 832 1 0.895 P$MADS MADS box proteinsP$AGL1.01 0.84 813 833 0.915 0.911 P$NCS1 Nodulin consensus sequence 1P$NCS1.01 0.85 872 882 0.805 0.888 P$LREM Light responsive elementP$RAP22.01 0.85 879 889 1 0.896 motif, not modulated by different lightqualities P$MSAE M-phase-specific activator P$MSA.01 0.8 880 894 1 0.877elements P$MYBL MYB-like proteins P$NTMYBAS1.01 0.96 900 916 0.949 0.968P$GTBX GT-box elements P$SBF1.01 0.87 909 925 1 0.905 P$MYBL MYB-likeproteins P$AS1_AS2_I.01 0.99 911 927 1 1 P$LREM Light responsive elementP$RAP22.01 0.85 981 991 1 0.893 motif, not modulated by different lightqualities O$PTBP Plant TATA binding protein O$PTATA.02 0.9 982 996 1 1factor P$L1BX L1 box, motif for L1 layer- P$PDF2.01 0.85 982 998 1 0.884specific expression O$VTBP Vertebrate TATA binding O$VTATA.01 0.9 983999 1 0.973 protein factor P$MADS MADS box proteins P$AGL15.01 0.79 10061026 0.825 0.793 P$MYBS MYB proteins with single P$ZMMRP1.01 0.79 10081024 0.778 0.811 DNA binding repeat O$PTBP Plant TATA binding proteinO$PTATA.02 0.9 1010 1024 1 0.91 factor P$CGCG Calmodulin binding/P$ATSR1.01 0.84 1051 1067 1 0.859 CGCG box binding proteins P$ABRE ABAresponse elements P$ABF1.01 0.79 1053 1069 1 0.797 P$CE3S Couplingelement 3 sequence P$CE3.01 0.77 1052 1070 1 0.874 P$NACF Plant specificNAC [NAM P$ANAC092.01 0.92 1055 1067 1 0.924 (no apical meristem),ATAF172, CUC2 (cup- shaped cotyledons 2)] transcription factors P$DPBFDc3 promoter binding factors P$DPBF.01 0.89 1057 1067 1 0.908 P$PREMMotifs of plastid response P$MGPROTORE.01 0.77 1059 1089 1 0.806elements O$MTEN Core promoter motif ten O$HMTE.01 0.88 1072 1092 0.9610.94 elements P$DREB Dehydration responsive P$HVDRF1.01 0.89 1079 1093 10.922 element binding factors P$PREM Motifs of plastid responseP$MGPROTORE.01 0.77 1077 1107 1 0.784 O$MTEN Core promoter motif tenO$DMTE.01 0.77 1097 1117 0.844 0.802 elements P$OPAQ Opaque-2 liketranscriptional P$O2.02 0.87 1135 1151 1 0.915 activators P$SALTSalt/drought responsive P$ALFIN1.02 0.95 1136 1150 1 0.954 elementsP$L1BX L1 box, motif for L1 layer- P$PDF2.01 0.85 1179 1195 1 0.882specific expression P$SBPD SBP-domain proteins P$SBP.01 0.88 1199 1215 10.912 P$PALA Conserved box A in PAL P$PALBOXA.01 0.84 1201 1219 1 0.863and 4CL gene promoters P$MYBS MYB proteins with single P$ZMMRP1.01 0.791230 1246 1 0.838 DNA binding repeat P$AHBP Arabidopsis homeoboxP$ATHB9.01 0.77 1244 1254 1 0.777 protein P$MADS MADS box proteinsP$AGL2.01 0.82 1248 1268 0.969 0.828 P$MYBS MYB proteins with singleP$MYBST1.01 0.9 1262 1278 1 0.953 DNA binding repeat P$HEAT Heat shockfactors P$HSE.01 0.81 1278 1294 1 0.864 P$LEGB Legumin Box familyP$RY.01 0.87 1277 1303 1 0.871 P$MYBS MYB proteins with singleP$OSMYBS.01 0.82 1343 1359 0.75 0.822 DNA binding repeat O$INRE Corepromoter initiator elements O$DINR.01 0.94 1349 1359 0.969 0.955 P$STKMStorekeeper motif P$STK.01 0.85 1355 1369 1 0.95 P$GTBX GT-box elementsP$GT1.01 0.85 1403 1419 0.969 0.85 O$VTBP Vertebrate TATA bindingO$ATATA.01 0.78 1439 1455 0.75 0.797 protein factor P$OCSE Enhancerelement first P$OCSL.01 0.69 1437 1457 0.769 0.734 identified in thepromoter of the octopine synthase gene (OCS) of the Agrobacteriumtumefaciens T-DNA P$HEAT Heat shock factors P$HSFA1A.01 0.75 1478 1494 10.764 P$WBXF W Box family P$WRKY.01 0.92 1488 1504 1 0.94 P$TEFB TEF-boxP$TEF1.01 0.76 1491 1511 0.957 0.859 P$MYBS MYB proteins with singleP$HVMCB1.01 0.93 1498 1514 1 0.934 DNA binding repeat P$MYBS MYBproteins with single P$TAMYB80.01 0.83 1509 1525 0.75 0.845 DNA bindingrepeat P$MSAE M-phase-specific activator P$MSA.01 0.8 1551 1565 1 0.807elements P$OPAQ Opaque-2 like transcriptional- P$O2.01 0.87 1558 1574 10.883 activators P$AHBP Arabidopsis homeobox P$ATHB5.01 0.89 1569 15790.83 0.904 protein P$AHBP Arabidopsis homeobox P$ATHB5.01 0.89 1569 15790.936 0.978 protein O$VTBP Vertebrate TATA binding O$ATATA.01 0.78 16091625 0.75 0.781 protein factor P$LREM Light responsive elementP$RAP22.01 0.85 1613 1623 1 0.966 motif, not modulated by differentlight qualities P$TEFB TEF-box P$TEF1.01 0.76 1617 1637 0.839 0.812P$WNAC Wheat NAC-domain transcription P$TANAC69.01 0.68 1625 1647 0.8960.811 factors P$NACF Plant specific NAC [NAM P$ANAC019.01 0.94 1632 16440.953 0.968 (no apical meristem), ATAF172, CUC2 (cup- shaped cotyledons2)] transcription factors P$GTBX GT-box elements P$S1F.01 0.79 1642 16581 0.917 P$PSRE Pollen-specific regulatory P$GAAA.01 0.83 1644 1660 10.864 elements P$MYBL MYB-like proteins P$MYBPH3.01 0.8 1647 1663 10.938 P$DOFF DNA binding with one finger P$DOF1.01 0.98 1694 1710 1 1(DOF) P$HEAT Heat shock factors P$HSFA1A.01 0.75 1703 1719 0.857 0.757P$CCAF Circadian control factors P$EE.01 0.84 1719 1733 1 0.953 P$MADSMADS box proteins P$AG.01 0.8 1717 1737 0.902 0.813 P$GTBX GT-boxelements P$ASIL1.01 0.93 1732 1748 1 0.967 O$INRE Core promoterinitiator elements O$DINR.01 0.94 1749 1759 1 0.965 P$SUCB Sucrose boxP$SUCROSE.01 0.81 1749 1767 0.75 0.83 P$SUCB Sucrose box P$SUCROSE.010.81 1754 1772 0.75 0.822 P$L1BX L1 box, motif for L1 layer- P$ATML1.020.76 1757 1773 0.89 0.848 specific expression P$AHBP Arabidopsishomeobox P$ATHB9.01 0.77 1761 1771 0.75 0.815 protein O$VTBP VertebrateTATA binding O$VTATA.02 0.89 1777 1793 1 0.996 protein factor P$DOFF DNAbinding with one finger P$DOF3.01 0.99 1778 1794 1 0.995 (DOF) O$PTBPPlant TATA binding protein O$PTATA.02 0.9 1780 1794 1 0.923 factorP$IBOX Plant I-Box sites P$GATA.01 0.93 1787 1803 1 0.967 P$MYBS MYBproteins with single P$MYBST1.01 0.9 1790 1806 1 0.972 DNA bindingrepeat O$VTBP Vertebrate TATA binding O$ATATA.01 0.78 1803 1819 0.750.812 protein factor P$IBOX Plant I-Box sites P$GATA.01 0.93 1847 1863 10.945 P$MYBS MYB proteins with single P$MYBST1.01 0.9 1850 1866 1 0.966DNA binding repeat P$MADS MADS box proteins P$SQUA.01 0.9 1866 1886 10.916 P$GTBX GT-box elements P$SBF1.01 0.87 1872 1888 1 0.905 O$VTBPVertebrate TATA binding O$LTATA.01 0.82 1873 1889 1 0.837 protein factorP$AHBP Arabidopsis homeobox P$HAHB4.01 0.87 1878 1888 1 0.902 proteinP$L1BX L1 box, motif for L1 layer- P$ATML1.01 0.82 1882 1898 0.75 0.824specific expression O$INRE Core promoter initiator elements O$DINR.010.94 1886 1896 0.969 0.949 P$EPFF EPF-type zinc finger factors,P$ZPT22.01 0.75 1887 1909 1 0.755 two canonical Cys2/His2 zinc fingermotifs separated by spacers of various length P$GAPB GAP-Box (lightresponse P$GAP.01 0.88 1907 1921 1 0.903 elements) P$SUCB Sucrose boxP$SUCROSE.01 0.81 1912 1930 1 0.849 P$HMGF High mobility group factorsP$HMG_IY.01 0.89 1920 1934 1 0.892 P$SEF4 Soybean embryo factor 4P$SEF4.01 0.98 1927 1937 1 0.984 P$MYBL MYB-like proteins P$ATMYB77.010.87 1973 1989 1 0.894 P$GTBX GT-box elements P$ASIL1.01 0.93 1998 20141 0.971 P$OPAQ Opaque-2 like transcriptional P$O2_GCN4.01 0.81 2001 20171 0.83 activators P$IBOX Plant I-Box sites P$GATA.01 0.93 2018 2034 10.964 P$MYBS MYB proteins with single P$MYBST1.01 0.9 2021 2037 1 0.957DNA binding repeat P$LREM Light responsive element P$RAP22.01 0.85 20352045 1 0.858 motif, not modulated by different light qualities P$MIIGMYB IIG-type binding sites P$MYBC1.01 0.92 2033 2047 1 0.941 P$HEAT Heatshock factors P$HSFA1A.01 0.75 2041 2057 1 0.801 P$MYBL MYB-likeproteins P$GAMYB.01 0.91 2054 2070 1 0.918 P$GTBX GT-box elementsP$GT1.01 0.85 2056 2072 1 0.876 P$ASRC AS1/AS2 repressor complexP$AS1_AS2_II.01 0.86 2067 2075 1 0.906 P$EINL Ethylen insensitive 3 likeP$TEIL.01 0.92 2098 2106 0.964 0.926 factors O$VTBP Vertebrate TATAbinding O$LTATA.01 0.82 2110 2126 1 0.828 protein factor P$MYBL MYB-likeproteins P$MYBPH3.02 0.76 2110 2126 1 0.807

5.2 Vector Construction

The DNA fragments representing promoter p-GOS2_perm1 (SEQ ID N014) andp-GOS2_perm2 (SEQ ID N015), respectively, were generated by genesynthesis. Endonucleolytic restriction sites suitable for cloning thepromoter fragments were included in the synthesis. The p-GOS2_perm1 (SEQID N014) and p-GOS2_perm2 (SEQ ID N015) promoters are cloned intodestination vectors compatible with the Multisite Gateway Systemupstream of an attachment site and a terminator using Swa1 restrictionendonuclease.

beta-Glucuronidase (GUS) or uidA gene which encodes an enzyme for whichvarious chromogenic substrates are known, is utilized as reporterprotein for determining the expression features of the permutatedp-GOS2_perm (SEQ ID NO14) and p-GOS2_perm2 (SEQ ID N015) promotersequences.

A pENTR/A vector harboring the beta-Glucuronidase reporter gene c-GUS(with the prefix c-denoting coding sequence) is constructed using sitespecific recombination (BP-reaction). By performing a site specificrecombination (LR-reaction), the created pENTR/A is combined with thedestination vector according to the manufacturers (Invitrogen, Carlsbad,Calif., USA) Multisite Gateway manual. The reaction yields a binaryvector with the p-GOS2_perm1 promoter (SEQ ID N014) or the p-Gos2_perm2promoter (SEQ ID NO 15), respectively, the beta-Glucuronidase codingsequence c-GUS and a terminator.

5.3 Generation of Transgenic Rice Plants

The Agrobacterium containing the respective expression vector is used totransform Oryza sativa plants. Mature dry seeds of the rice japonicacultivar Nipponbare are dehusked. Sterilization is carried out byincubating for one minute in 70% ethanol, followed by 30 minutes in 0.2%HgCl₂, followed by a 6 times 15 minutes wash with sterile distilledwater. The sterile seeds are then germinated on a medium containing2.4-D (callus induction medium). After incubation in the dark for fourweeks, embryogenic, scutellum-derived calli are excised and propagatedon the same medium. After two weeks, the calli are multiplied orpropagated by subculture on the same medium for another 2 weeks.Embryogenic callus pieces are sub-cultured on fresh medium 3 days beforeco-cultivation (to boost cell division activity).

Agrobacterium strain LBA4404 containing the respective expression vectoris used for co-cultivation. Agrobacterium is inoculated on AB mediumwith the appropriate antibiotics and cultured for 3 days at 28° C. Thebacteria are then collected and suspended in liquid co-cultivationmedium to a density (OD₆₀₀) of about 1. The suspension is thentransferred to a Petri dish and the calli immersed in the suspension for15 minutes. The callus tissues are then blotted dry on a filter paperand transferred to solidified, co-cultivation medium and incubated for 3days in the dark at 25° C. Co-cultivated calli are grown on2.4-D-containing medium for 4 weeks in the dark at 28° C. in thepresence of a selection agent. During this period, rapidly growingresistant callus islands developed. After transfer of this material to aregeneration medium and incubation in the light, the embryogenicpotential is released and shoots developed in the next four to fiveweeks. Shoots are excised from the calli and incubated for 2 to 3 weekson an auxin-containing medium from which they are transferred to soil.Hardened shoots are grown under high humidity and short days in agreenhouse.

The primary transformants are transferred from a tissue culture chamberto a greenhouse. After a quantitative PCR analysis to verify copy numberof the T-DNA insert, only single copy transgenic plants that exhibittolerance to the selection agent are kept for harvest of T1 seed. Seedsare then harvested three to five months after transplanting. The methodyields single locus transformants at a rate of over 50% (Aldemita andHodges1996, Chan et al. 1993, Hiei et al. 1994).

Example 6 Expression profile of the p-GOS2_perm1 (SEQ ID NO14) andp-GOS2_perm2 (SEQ ID NO15) control elements

To demonstrate and analyze the transcription regulating properties of apromoter, it is useful to operably link the promoter or its fragments toa reporter gene, which can be employed to monitor its expression bothqualitatively and quantitatively. Preferably bacterial R-glucuronidaseis used (Jefferson 1987). β-glucuronidase activity can be monitored inplanta with chromogenic substrates such as5-bromo-4-Chloro-3-indolyl-β-D-glucuronic acid during correspondingactivity assays (Jefferson 1987). For determination of promoter activityand tissue specificity, plant tissue is dissected, stained and analyzedas described (e.g., Bäumlein 1991).

The regenerated transgenic T0 rice plants are used for reporter geneanalysis.

General results for SEQ ID NO14: Medium-strong GUS expression isdetected in all plant tissues analyzed.

General results for SEQ ID NO15: Medium-strong GUS expression isdetected in all plant tissues analyzed.

General results for SEQ ID NO13: Medium-strong GUS expression isdetected in all plant tissues analyzed

1-14. (canceled)
 15. A method for the production of one or moresynthetic regulatory nucleic acid molecules of a defined specificitycomprising the steps of: a) identifying at least one naturally occurringnucleic acid molecule of the defined specificity; and b) identifyingconserved motifs in the at least one nucleic acid sequence of thenucleic acid molecule of the defined specificity as defined in a)(starting sequence); and c) mutating the starting sequence while i)leaving at least 70% of the motifs unaltered known to be involved inregulation of the defined specificity; and ii) leaving at least 80% ofthe motifs unaltered involved in transcription initiation; and iii)leaving at least 10% of other identified motifs unaltered; and iv)keeping the arrangement of the identified motifs unaltered and; v)avoiding the introduction of new motifs known to influence expressionwith a specificity other than said defined specificity; and vi) avoidingidentical stretches of more than 50 basepairs between each of thestarting sequence and the one or more mutated sequences; and d)producing a nucleic acid molecule comprising the mutated sequence; ande) optionally testing the specificity of the mutated sequence in therespective organism.
 16. The method of claim 15, wherein the stretch ofidentical basepairs is 40 basepairs or less.
 17. A regulatory nucleicacid molecule produced by the method of claim
 15. 18. An expressionconstruct comprising the regulatory nucleic acid molecule of claim 17.19. A vector comprising the regulatory nucleic acid molecule of claim17.
 20. A vector comprising the expression construct of claim
 18. 21. Amicroorganism, plant cell, or animal cell comprising the regulatorynucleic acid molecule of claim
 17. 22. A plant comprising the regulatorynucleic acid molecule of claim
 17. 23. A regulatory nucleic acidmolecule produced by the method of claim 15, wherein the regulatorynucleic acid molecule is selected from the group consisting of: i) anucleic acid molecule having a nucleic acid sequence selected from SEQID NOS: 2, 4, 6, 14, or 15; and ii) a nucleic acid molecule having atleast 250 consecutive base pairs of a nucleic acid sequence selectedfrom SEQ ID NOS: 2, 4, 6, 14, or 15; and iii) a nucleic acid moleculehaving an identity of at least 70% over a sequence of at least 250consecutive nucleic acid base pairs to a nucleic acid sequence selectedfrom SEQ ID NOS: 2, 4, 6, 14, or 15; and iv) a nucleic acid moleculehybridizing under high stringent conditions with a nucleic acid moleculehaving at least 250 consecutive base pairs of a nucleic acid sequenceselected from SEQ ID NOS: 2, 4, 6, 14, or 15; and v) a complement of anyof the nucleic acid molecules of i) to iv).
 24. The regulatory nucleicacid molecule of claim 23, wherein the regulatory nucleic acid moleculeis selected from the group consisting of: i) a nucleic acid moleculehaving a nucleic acid sequence selected from SEQ ID NOS: 2, 4, 6, 14, or15; and ii) a nucleic acid molecule having at least 250 consecutive basepairs of a nucleic acid sequence selected from SEQ ID NOS: 2, 4, 6, 14,or 15; and iii) a nucleic acid molecule having an identity of at least75% over a sequence of at least 250 consecutive nucleic acid base pairsto the nucleic acid sequence of SEQ ID NO: 6; and iv) a nucleic acidmolecule having an identity of at least 90% over a sequence of at least250 consecutive nucleic acid base pairs to a nucleic acid sequenceselected from SEQ ID NOS: 2, 4, 14, or 15; and v) a nucleic acidmolecule hybridizing under high stringent conditions with a nucleic acidmolecule having at least 250 consecutive base pairs of a nucleic acidsequence selected from SEQ ID NOS: 2, 4, 6, 14, or 15; and vi) acomplement of any of the nucleic acid molecules of i) to v).
 25. Theregulatory nucleic acid molecule of claim 23, wherein the regulatorynucleic acid molecule does not comprise a starting molecule having anucleic acid sequence of SEQ ID NOS: 1, 3, 5, or 13 or a complementthereof or a nucleic acid molecule having at least 250 consecutive basepairs of a nucleic acid sequence of SEQ ID NOS: 1, 3, 5, or 13 or acomplement thereof.
 26. The regulatory nucleic acid molecule of claim24, wherein the regulatory nucleic acid molecule does not comprise astarting molecule having a nucleic acid sequence of SEQ ID NOS: 1, 3, 5,or 13 or a complement thereof or a nucleic acid molecule having at least250 consecutive base pairs of a nucleic acid sequence of SEQ ID NOS: 1,3, 5, or 13 or a complement thereof.
 27. An expression constructcomprising a regulatory nucleic acid molecule of claim
 23. 28. Anexpression construct comprising the regulatory nucleic acid molecule ofclaim
 24. 29. An expression construct comprising the regulatory nucleicacid molecule of claim
 25. 30. An expression construct comprising theregulatory nucleic acid molecule of claim
 26. 31. A vector comprisingthe regulatory nucleic acid molecule of claim
 23. 32. A vectorcomprising the regulatory nucleic acid molecule of claim
 24. 33. Avector comprising the regulatory nucleic acid molecule of claim
 25. 34.A vector comprising the regulatory nucleic acid molecule of claim 26.35. A microorganism, plant cell or animal cell comprising the regulatorynucleic acid molecule of claim
 23. 36. A microorganism, plant cell oranimal cell comprising the regulatory nucleic acid molecule of claim 24.37. A microorganism, plant cell or animal cell comprising the regulatorynucleic acid molecule of claim
 25. 38. A microorganism, plant cell oranimal cell comprising the regulatory nucleic acid molecule of claim 26.39. A plant comprising the regulatory nucleic acid molecule of claim 23.40. A plant comprising the regulatory nucleic acid molecule of claim 24.41. A plant comprising the regulatory nucleic acid molecule of claim 25.42. A plant comprising the regulatory nucleic acid molecule of claim 26.